事象
Hive テーブルを作成しようとすると "FAILED: SemanticException java.lang.IllegalArgumentException: java.net.UnknownHostException: " と怒られる。
hive> CREATE TABLE parquet.amazon_reviews_parquet( marketplace string, customer_id string, review_id string, product_id string, product_parent string, product_title string, star_rating int, helpful_votes int, total_votes int, vine string, verified_purchase string, review_headline string, review_body string, review_date bigint, year int) PARTITIONED BY (product_category string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://amazon-reviews-pds/parquet'; FAILED: SemanticException java.lang.IllegalArgumentException: java.net.UnknownHostException: amazon-reviews-pds
原因
"hdfs://ホスト名/パス" という書式なので、"hdfs://amazon-reviews-pds/parquet" と書くと "amazon-reviews-pds" はホスト名になるため。ローカルパスの場合は "hdfs:///amazon-reviews-pds/parquet" と書けば良い。
There is one additional / after hdfs://, which is a protocol name. You must go to /tmp/... via hdfs:// protocol, that's why URL needs additional /. Without this, Spark is trying to reach host tmp, not folder
scala - java.lang.IllegalArgumentException: java.net.UnknownHostException: tmp - Stack Overflow
解決策
"hdfs://amazon-reviews-pds/parquet" を "hdfs:///amazon-reviews-pds/parquet" に書き換える。
hive> CREATE TABLE parquet.amazon_reviews_parquet( marketplace string, customer_id string, review_id string, product_id string, product_parent string, product_title string, star_rating int, helpful_votes int, total_votes int, vine string, verified_purchase string, review_headline string, review_body string, review_date bigint, year int) PARTITIONED BY (product_category string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs:///amazon-reviews-pds/parquet'; OK Time taken: 0.051 seconds