ablog

不器用で落着きのない技術者のメモ

Hive テーブル作成時に "java.lang.IllegalArgumentException: java.net.UnknownHostException" と怒られる

事象

Hive テーブルを作成しようとすると "FAILED: SemanticException java.lang.IllegalArgumentException: java.net.UnknownHostException: " と怒られる。

hive> CREATE TABLE parquet.amazon_reviews_parquet(
  marketplace string, 
  customer_id string, 
  review_id string, 
  product_id string, 
  product_parent string, 
  product_title string, 
  star_rating int, 
  helpful_votes int, 
  total_votes int, 
  vine string, 
  verified_purchase string, 
  review_headline string, 
  review_body string, 
  review_date bigint, 
  year int)
PARTITIONED BY (product_category string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://amazon-reviews-pds/parquet';

FAILED: SemanticException java.lang.IllegalArgumentException: java.net.UnknownHostException: amazon-reviews-pds

原因

"hdfs://ホスト名/パス" という書式なので、"hdfs://amazon-reviews-pds/parquet" と書くと "amazon-reviews-pds" はホスト名になるため。ローカルパスの場合は "hdfs:///amazon-reviews-pds/parquet" と書けば良い。

There is one additional / after hdfs://, which is a protocol name. You must go to /tmp/... via hdfs:// protocol, that's why URL needs additional /. Without this, Spark is trying to reach host tmp, not folder

scala - java.lang.IllegalArgumentException: java.net.UnknownHostException: tmp - Stack Overflow

解決策

"hdfs://amazon-reviews-pds/parquet" を "hdfs:///amazon-reviews-pds/parquet" に書き換える。

hive> CREATE TABLE parquet.amazon_reviews_parquet(
  marketplace string, 
  customer_id string, 
  review_id string, 
  product_id string, 
  product_parent string, 
  product_title string, 
  star_rating int, 
  helpful_votes int, 
  total_votes int, 
  vine string, 
  verified_purchase string, 
  review_headline string, 
  review_body string, 
  review_date bigint, 
  year int)
PARTITIONED BY (product_category string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs:///amazon-reviews-pds/parquet';
OK
Time taken: 0.051 seconds