pyspark.sql module の select で DataFrame の全カラムを取得する

pyspark.sql module の select で DataFrame の全カラムを取得する。

df = df.select([column for column in df.columns])


drop_list = ['a column', 'another column', ...]
df.select([column for column in df.columns if column not in drop_list])

apache-spark — pysparkデータフレームの列を削除する方法


  • Projects a set of expressions and returns a new DataFrame.
  • Parameters
    • cols – list of column names (string) or expressions (Column). If one of the column names is ‘*’, that column is expanded to include all columns in the current DataFrame.
>>> df.select('*').collect()
[Row(age=2, name='Alice'), Row(age=5, name='Bob')]
>>> df.select('name', 'age').collect()
[Row(name='Alice', age=2), Row(name='Bob', age=5)]
>>> df.select(df.name, (df.age + 10).alias('age')).collect()
[Row(name='Alice', age=12), Row(name='Bob', age=15)]

New in version 1.3.

pyspark.sql module — PySpark master documentation