Spark实战(3) DataFrame基础之行列操作和SQL
程序员文章站
2022-06-13 22:15:54
...
行列操作
df['age'] # I only get a column object
df.select('age').show() # I get a datafram with a column that we could use with show() method
# see the first two row elements
df.head(2) # return a list
df.select(['age','name']).show() # get two columns
# create a new column
df.withColumn('double_age',df['age'] * 2).show() # this is not inplace
# rename a column
df.withColumnRenamed('age','my_new_age').show()
SQL操作
# very useful when you are familar with SQL
# create a temp view at first
df.createOrReplaceTempView('people') # the table name is people
# create one sql query and get the result
results = spark.sql("SELECT * FROM people")
results.show()
# create another sql query and get the result
new_results = spark.sql("SELECT * FROM people WHERE age=30")
new_results.show()