欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

SparkSQL一行转多行 一列变多列 多行转一行

程序员文章站 2022-03-31 21:47:13
...

项目开发中有这样的需求,原始数据如下:

+--------+-----------+
|    name|    message|
+--------+-----------+
|zhangsan| 4=18,33=78|
|    lisi|23=54,67=88|
+--------+-----------+

一行转多行:

根据指定的标识符进行切分,然后一行转多行,以新列进行展示:

val result= df.withColumn("newMessage",functions.explode(functions.split(functions.col("message"),",")))
result.show()

结果如下:

+--------+-----------+----------+
|    name|    message|newMessage|
+--------+-----------+----------+
|zhangsan| 4=18,33=78|      4=18|
|zhangsan| 4=18,33=78|     33=78|
|    lisi|23=54,67=88|     23=54|
|    lisi|23=54,67=88|     67=88|
+--------+-----------+----------+

一列变多列:

将上面的结果数据newMessage,以“=”进行切割,然后等号两边的数据两列展示:

val Df2 = result.withColumn("newMessage2",split(col("newMessage"), "=")).select(
    col("name"),
    col("message"),
    col("newMessage"),
    col("newMessage2").getItem(0).as("start"),
    col("newMessage2").getItem(1).as("end")
  ).drop("newMessage2");
Df2.show(false)

结果如下:

+--------+-----------+----------+-----+---+
|name    |message    |newMessage|start|end|
+--------+-----------+----------+-----+---+
|zhangsan|4=18,33=78 |4=18      |4    |18 |
|zhangsan|4=18,33=78 |33=78     |33   |78 |
|lisi    |23=54,67=88|23=54     |23   |54 |
|lisi    |23=54,67=88|67=88     |67   |88 |
+--------+-----------+----------+-----+---+

多行转一行:

将下面数据中相同name的newMessage数据转成一行:

+--------+----------+
|    name|newMessage|
+--------+----------+
|zhangsan|      4=18|
|zhangsan|     33=78|
|    lisi|     23=54|
|    lisi|     67=88|
+--------+----------+
dataFrame.createTempView("dataFrame")
sparkSession.sql("select name,concat_ws(',',collect_set(newMessage)) as message from dataFrame group by name").show()

结果如下:

+--------+-----------+
|    name|    message|
+--------+-----------+
|zhangsan| 4=18,33=78|
|    lisi|67=88,23=54|
+--------+-----------+