deta lake
程序员文章站
2022-03-08 14:06:57
...
环境搭建
pip install --upgrade pyspark
pyspark --packages io.delta:delta-core_2.12:0.7.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.7.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
from delta.tables import *
Create a table
data = spark.range(0, 5)
data.write.format("delta").save("/tmp/delta-table")
>>> spark.sql("CREATE TABLE IF NOT EXISTS events USING DELTA LOCATION '/delta/events'");
DataFrame[]
Read data
>>> df = spark.read.format("delta").load("/tmp/delta-table")
>>> df.show()
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
+---+
>>> spark.sql("select id from events2").show();
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
+---+
Update table data
data = spark.range(5, 10)
data.write.format("delta").mode("overwrite").save("/tmp/delta-table")
Conditional update without overwrite
>>> from delta.tables import *
>>> from pyspark.sql.functions import *
>>> deltaTable = DeltaTable.forPath(spark, "/tmp/delta-table")
>>> deltaTable.update(
... condition = expr("id % 2 == 0"),
... set = { "id": expr("id + 100") })
>>> deltaTable.delete(condition = expr("id % 2 == 0"))
>>> newData = spark.range(0, 20)
>>> deltaTable.alias("oldData").merge(newData.alias("newData"),"oldData.id = newData.id").whenMatchedUpdate(set = { "id": col("newData.id") }).whenNotMatchedInsert(values = { "id": col("newData.id") }).execute()
>>> deltaTable.toDF().show()
+---+
| id|
+---+
| 9|
| 4|
| 18|
| 17|
| 6|
| 3|
| 8|
| 2|
| 10|
| 7|
| 12|
| 11|
| 1|
| 16|
| 19|
| 15|
| 0|
| 5|
| 13|
| 14|
+---+
上一篇: webpack require context 说明
下一篇: DataX编译笔记
推荐阅读
-
MacBook Pro搭载具体哪款Kaby Lake处理器?
-
Intel Alder Lake-S桌面CPU浮现:10nm++工艺、LGA1700接口
-
MacBook Pro搭载具体哪款Kaby Lake处理器?
-
Intel 10nm++工艺终于传来喜讯 45W Tiger Lake-H内测中
-
Kaby Lake-X怎么样?Kaby Lake-X i7-7740X/i5-7640X深度评测
-
8代酷睿Coffee Lake首测 Intel i5 8250U移动CPU处理器性能对比评测
-
Intel即将揭秘第11代核显:10nm Ice Lake年底见
-
Intel Comet Lake家族酷睿i3-10100桌面处理器现身:首次4核8线程
-
Intel:10nm的Ice Lake处理器相比前代有18% IPC提升
-
Intel退役Kaby Lake-G处理器 A/I合体的杰作消失了