ablog

不器用で落着きのない技術者のメモ

parquet-cli で Parquet ファイルを読む

parquet-tools は deprecated になっているらしく、parquet-cli をインストールして Parquet ファイルを読んでみた。

あと関連記事を調べると parquet-tools も紹介されていたりするけど,現時点では使えなくなっている🔥

$ brew install parquet-tools
Error: parquet-tools has been disabled because it is deprecated upstream!
parquet-cli で Parquet ファイルを読む - kakakakakku blog

インストール

$ brew install parquet-cli
$ brew info parquet-cli

使ってみる

% parquet meta run-1685084488806-part-block-0-r-00000-snappy.parquet

File path:  run-1685084488806-part-block-0-r-00000-snappy.parquet
Created by: parquet-glue version 1.8.2
Properties: (none)
Schema:
message glue_schema {
  optional binary col0 (STRING);
  optional binary col1 (STRING);
  optional binary col2 (STRING);
  optional binary col3 (STRING);
  optional binary col4 (STRING);
  optional binary col5 (STRING);
  optional binary col6 (STRING);
}


Row group 0:  count: 100000  81.20 B records  start: 4  total(compressed): 7.743 MB total(uncompressed):14.955 MB
--------------------------------------------------------------------------------
      type      encodings count     avg size   nulls   min / max
col0  BINARY    S   _     100000    4.32 B     0       "1" / "99999"
col1  BINARY    S   _     100000    4.98 B     0       "Supplier#000000001" / "Supplier#000100000"
col2  BINARY    S   _     100000    28.53 B    0       "  , Jd6qNPDAgz" / "zzyu4VZw4LGgCMMJG8Yr"
col3  BINARY    S _ R     100000    0.63 B     0       "0" / "9"
col4  BINARY    S   _     100000    11.85 B    0       "10-100-166-6237" / "34-998-900-4911"
col5  BINARY    S   _     100000    6.76 B     0       "-0.12" / "9999.93"
col6  BINARY    S   _     100000    24.13 B    0       " Customer  blithely regul..." / "zzle. slyly regular packa..."

% parquet head run-1685084488806-part-block-0-r-00000-snappy.parquet
{"col0": "1000", "col1": "Supplier#000001000", "col2": "sep4GQHrXe", "col3": "17", "col4": "27-971-649-2792", "col5": "7307.62", "col6": "press deposits boost thinly quickly unusual instructions. unusual forges haggle ruthlessly. packa"}
{"col0": "999", "col1": "Supplier#000000999", "col2": "XIA9uPu,fDZTOC,ItOGKYNXnoTvCuULtzmnSk", "col3": "2", "col4": "12-991-892-1050", "col5": "3898.69", "col6": " ironic requests snooze? unusual depths alongside of the furiously "}
{"col0": "998", "col1": "Supplier#000000998", "col2": "lgaoC,43IUbHf3Ar5odS8wQKp", "col3": "15", "col4": "25-430-605-1180", "col5": "3282.62", "col6": "hs against the unusual accounts haggle r"}
{"col0": "997", "col1": "Supplier#000000997", "col2": "7eUWMrOCKCp2JYas6P4mL93eaWIOtKKWtTX", "col3": "3", "col4": "13-221-322-7971", "col5": "3659.56", "col6": "y regular excuses boost slyly furiously final deposits. evenly fi"}
{"col0": "996", "col1": "Supplier#000000996", "col2": "Wx4dQwOAwWjfSCGupfrM", "col3": "7", "col4": "17-447-811-3282", "col5": "6329.90", "col6": " ironic forges cajole blithely agai"}
{"col0": "995", "col1": "Supplier#000000995", "col2": "CgVUX8DtNbtug2M,N", "col3": "18", "col4": "28-180-818-2912", "col5": "9025.90", "col6": "s nag. furiously even theodolites cajole."}
{"col0": "994", "col1": "Supplier#000000994", "col2": "0qF9I2cfv48Cu", "col3": "4", "col4": "14-183-331-6019", "col5": "8855.24", "col6": "sits boost blithely final instructions. ironic m"}
{"col0": "993", "col1": "Supplier#000000993", "col2": "z2NwUJ TPfd9MP8K3Blp1prYQ116 ", "col3": "2", "col4": "12-316-384-2073", "col5": "2336.52", "col6": " asymptotes haggle slowly above the"}
{"col0": "992", "col1": "Supplier#000000992", "col2": "iZPAlGecV0uUsxMikQG7s", "col3": "2", "col4": "12-663-356-1288", "col5": "4379.45", "col6": "silent packages. quickly regular requests against the carefully unusual theodolites affix fu"}
{"col0": "991", "col1": "Supplier#000000991", "col2": "Bh4Danx VvUpMce x42", "col3": "16", "col4": "26-793-462-2874", "col5": "4026.14", "col6": "foxes are slyly above the furiously express t"}

環境