pyspark.sql module の select で DataFrame の全カラムを取得する。
df = df.select([column for column in df.columns])
参考
drop_list = ['a column', 'another column', ...]
apache-spark — pysparkデータフレームの列を削除する方法
df.select([column for column in df.columns if column not in drop_list])
pyspark.sql module — PySpark master documentationselect(*cols)
- Projects a set of expressions and returns a new DataFrame.
- Parameters
- cols – list of column names (string) or expressions (Column). If one of the column names is ‘*’, that column is expanded to include all columns in the current DataFrame.
>>> df.select('*').collect() [Row(age=2, name='Alice'), Row(age=5, name='Bob')] >>> df.select('name', 'age').collect() [Row(name='Alice', age=2), Row(name='Bob', age=5)] >>> df.select(df.name, (df.age + 10).alias('age')).collect() [Row(name='Alice', age=12), Row(name='Bob', age=15)]New in version 1.3.