ablog

不器用で落着きのない技術者のメモ

Spark SQL on EMR に JDBC 接続する

Spark SQL on EMR に JDBC 接続したメモ。

  • EMRクラスタを作成する
    • Release label:emr-5.12.0
    • Hadoop distribution:Amazon 2.8.3
    • Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Zeppelin 0.7.3, Spark 2.2.1, Presto 0.188
  • ssh でマスターノードにログインする
ssh -i ~/mykey.pem hadoop@ec2-**-***-***-**.ap-northeast-1.compute.amazonaws.com
  • thriftserver を起動する
$ sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh
  • 接続してみる
$ sudo -u spark /usr/lib/spark/bin/beeline
Beeline version 1.2.1-spark2-amzn-0 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10001
Connecting to jdbc:hive2://localhost:10001
Enter username for jdbc:hive2://localhost:10001: hadoop
Enter password for jdbc:hive2://localhost:10001:<Return>
18/03/21 10:51:29 INFO Utils: Supplied authorities: localhost:10001
18/03/21 10:51:29 INFO Utils: Resolved authority: localhost:10001
18/03/21 10:51:29 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10001
Connected to: Spark SQL (version 2.2.1)
Driver: Hive JDBC (version 1.2.1-spark2-amzn-0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10001> show tables;
+-----------+------------+--------------+--+
| database  | tableName  | isTemporary  |
+-----------+------------+--------------+--+
+-----------+------------+--------------+--+
No rows selected (1.201 seconds)
0: jdbc:hive2://localhost:10001>
  • 切断する
0: jdbc:hive2://localhost:10001> !exit
Closing: 0: jdbc:hive2://localhost:10001
Error: Error while cleaning up the server resources (state=,code=0)
Connection is already closed.