pyspark date_format、concat_ws、datediff、explode、collect_list、arrays_zip、regexp_replace等
·
import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.sql.functions import split
from pyspark.sql.functions import regexp_replace, col
date_format、concat_ws、datediff
** cast 时间格式转化
df.withColumn("action_date",F.date_format(F.concat_ws("-", *cols),"yyyy-MM-dd").cast("date"))
df.withColumn("datediff", F.datediff(F.col("date"), F.col('action_date')))

explode、collect_list、arrays_zip
参考:https://www.runexception.com/q/6851
**explode里传入的是列表
df.withColumn("tag",F.explode(split(col("clean_tag"), ",")))
df.groupBy("dnum", "channel").agg(F.collect_list(F.col("tag")).alias("tag"), F.collect_list(F.col("score")).alias(
"score")).withColumn("clean_tag",
F.arrays_zip(
F.col("tag"),
F.col(
"score")))
regexp_replace
** 正则提取与正则替换
df.withColumn('tags', regexp_replace(col('tags'), "\[", ""))

更多推荐
所有评论(0)