Glue Spark ジョブで dynamic_frame から Parquet を読もうとすると "Unsupported encoding: DELTA_BINARY_PACKED" と怒られる

事象

Glue Spark ジョブで dynamic_frame から Parquet を読もうとすると "Unsupported encoding: DELTA_BINARY_PACKED" と怒られる。

解決策

以下を設定してやる。

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

参考

In order to generate the DELTA encoded parquet file in PySpark, we need to enable version 2 of the Parquet write. This is the only way it works. Also, for some reason the setting only works when creating the spark context. The setting is:
"spark.hadoop.parquet.writer.version": "v2"
and the result is:
time:         INT64 GZIP DO:0 FPO:11688 SZ:84010/2858560/34.03 VC:15043098 ENC:DELTA_BINARY_PACKED ST:[min: 1577715561210, max: 1577839907009, num_nulls: 0]
HOWEVER, one cannot read the same file back in PySpark as is as you will get
java.lang.UnsupportedOperationException: Unsupported encoding: DELTA_BINARY_PACKED
In order to read the file back, one needs to disable the following conf:
spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
scala - Write a parquet file with delta encoded coulmns - Stack Overflow