欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Spark实战(3) DataFrame基础之行列操作和SQL

程序员文章站 2022-06-13 22:15:54
...

文章目录

行列操作

df['age'] # I only get a column object
df.select('age').show() # I get a datafram with a column that we could use with show() method

# see the first two row elements
df.head(2) # return a list

df.select(['age','name']).show() # get two columns

# create a new column
df.withColumn('double_age',df['age'] * 2).show() # this is not inplace

# rename a column
df.withColumnRenamed('age','my_new_age').show()

SQL操作

# very useful when you are familar with SQL

# create a temp view at first
df.createOrReplaceTempView('people') # the table name is people

# create one sql query and get the result
results = spark.sql("SELECT * FROM people")
results.show()

# create another sql query and get the result
new_results = spark.sql("SELECT * FROM people WHERE age=30")
new_results.show()

相关标签: Spark