ablog

不器用で落着きのない技術者のメモ

Lambda の同時実行数を1 にして並行実行してみる

  • AWS マネジメントコンソールで同時実行数を 1 に設定する。

f:id:yohei-a:20200629152942p:plain

  • 並行で実行する
for i in {1..100}
do
aws lambda invoke --function-name testLambdaFunction output_${i}.txt 2>&1 &
done;
  • TooManyRequestsException が発生し、4 回リトライして実行できない場合は諦めている。
{
    "ExecutedVersion": "$LATEST",
    "StatusCode": 200
}

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.

An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded.
{
    "ExecutedVersion": "$LATEST",
    "StatusCode": 200
}

macOS で Docker のログを見る

$ docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
# cd /var/lib/docker/containers/
/var/lib/docker/containers # ls -l
total 8
drwx------    4 root     root          4096 Jun 28 16:48 3449e97b9cc9c4b7afed594d23392e0261ca63cf803eed411a5a20f9041e89bb
drwx------    4 root     root          4096 Jun 28 16:52 ca94cae5e0cc8511584ba4965b3a08396fc769ac4d9556b7ef8279486b597901
# cd ca94cae5e0cc8511584ba4965b3a08396fc769ac4d9556b7ef8279486b597901
# ls -l
total 1668
-rw-r-----    1 root     root       1671811 Jun 28 17:02 ca94cae5e0cc8511584ba4965b3a08396fc769ac4d9556b7ef8279486b597901-json.log
drwx------    2 root     root          4096 Jun 28 16:52 checkpoints
-rw-------    1 root     root          2572 Jun 28 16:52 config.v2.json
-rw-r--r--    1 root     root          1258 Jun 28 16:52 hostconfig.json
-rw-r--r--    1 root     root            13 Jun 28 16:52 hostname
-rw-r--r--    1 root     root           174 Jun 28 16:52 hosts
drwx------    2 root     root          4096 Jun 28 16:52 mounts
-rw-r--r--    1 root     root            68 Jun 28 16:52 resolv.conf
-rw-r--r--    1 root     root            71 Jun 28 16:52 resolv.conf.hash
# less ca94cae5e0cc8511584ba4965b3a08396fc769ac4d9556b7ef8279486b597901-json.log
{"log":"\u001b[K{\"log\":\"Caused by: java.io.FileNotFoundException: Log directory specified does not exist: spark-ui-log-az11/eventlog\\r\\n\",\"stream\":\"stdout\",\"
time\":\"2020-06-28T16:48:51.\r\n","stream":"stdout","time":"2020-06-28T16:55:07.7423937Z"}
{"log":"\u001b[K537133028Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742415084Z"}
{"log":"\u001b[K{\"log\":\"\\u0009at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider
.scala:267)\\r\\n\",\"st\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742422149Z"}
{"log":"\u001b[Kream\":\"stdout\",\"time\":\"2020-06-28T16:48:51.53736452Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742428684Z"}
{"log":"\u001b[K{\"log\":\"\\u0009at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:211)\\r\\n\",\"stream\":\"stdout\",\"time\":\"
2020-06-28T16:48:51.537374946Z\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742434581Z"}
{"log":"\u001b[K\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742440539Z"}
{"log":"\u001b[K{\"log\":\"\\u0009at org.apache.spark.deploy.history.FsHistoryProvider.\\u003cinit\\u003e(FsHistoryProvider.scala:207)\\r\\n\",\"stream\":\"stdout\",\"t
ime\":\"2020-06-28T16:48:51.5377\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742445306Z"}
{"log":"\u001b[K41419Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742451433Z"}
{"log":"\u001b[K{\"log\":\"\\u0009at org.apache.spark.deploy.history.FsHistoryProvider.\\u003cinit\\u003e(FsHistoryProvider.scala:86)\\r\\n\",\"stream\":\"stdout\",\"ti
me\":\"2020-06-28T16:48:51.53776\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742458856Z"}
{"log":"\u001b[K9772Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.74246571Z"}
{"log":"\u001b[K{\"log\":\"\\u0009... 6 more\\r\\n\",\"stream\":\"stdout\",\"time\":\"2020-06-28T16:48:51.538006567Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:
07.74247089Z"}
{"log":"\u001b[K{\"log\":\"Caused by: java.io.FileNotFoundException: File spark-ui-log-az11/eventlog does not exist\\r\\n\",\"stream\":\"stdout\",\"time\":\"2020-06-28T
16:48:51.538259088Z\"}\r\n","stream":"stdout","time":"2020-06-28T16:55:07.742476435Z"}

Glue でクロールすると "Service: Amazon S3; Status Code: 403 Error Code AccessDenied" と怒られる

事象

  • AWS Glue Crawler でクロールすると "Error Access Denied (Service: Amazon S3; Status Code: 403; Error Code AccessDenied; Request ID: ..." と怒られる。
  • IAMポリシー、KMS キーポリシー、S3 バケットポリシーでは必要な権限は付与されている。
[881e9848-37ed-431a-b55d-6cdbc1e11fd8] ERROR : Error Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 86C79E196743EA2A; S3 Extended Request ID: QZPdP3jwIjqaAERbnsfOQVvxbZC5TdFQc0wQ4n9OVu+/Vld7qXOVdIc4gf1YAP9BbM9yvRDAtco=) retrieving file at s3://landing-bucket/raw/master.tsv. Tables created did not infer schemas from this file.

原因

  • 他のAWSアカウントからS3にアップロードされた後、bucket-owner-full-control が付与されていなかったため。

f:id:yohei-a:20200628204913p:plain

解決策

  • ACLbucket-owner-full-control を付与する。
$ aws s3api put-object-acl --bucket landing-bucket --key raw/master.tsv --acl bucket-owner-full-control

f:id:yohei-a:20200628204939p:plain

Docker で Apache Spark UI から Glue のジョブ結果を確認する

Docker で Apache Spark History Server を起動し、Spark Web UI を表示したメモ。
macOS on Macbook Pro で Spark UI を使ってみた。

  • git をインストール
# macOS
$ brew install git
# Linux(RedHat系)
$ sudo yum -y install git
  • セットアップ
$ git clone https://github.com/aws-samples/aws-glue-samples.git
$ cd aws-glue-samples/utilities/Spark_UI/
$ docker build -t glue/sparkui:latest . 
  • 起動
$ LOG_DIR="s3a://spark-ui-tokyo/eventlog/"
$ AWS_ACCESS_KEY_ID="..."
$ AWS_SECRET_ACCESS_KEY="..."
$ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"

f:id:yohei-a:20200629022323p:plain

Amazon Linux 2 に docker をインストールする

Amazon Linux 2 に docker をインストールしたメモ。

  • docker をインストール
sudo yum install -y docker
sudo usermod -a -G docker ec2-user
  • 起動
sudo /bin/systemctl start docker.service
sudo systemctl enable docker

Amazon SageMaker の Jupyter Notebook のコードを Github で管理する

f:id:yohei-a:20200630140956p:plain

  • GitHub から pull/push する
    • [Amazon SageMaker]-[ノートブックインスタンス]-[JupyterLab]を選択。
      • 下向きの矢印が pull、上向きの矢印が push、フォルタアイコンをクリックするとファイルを参照できる。

f:id:yohei-a:20200621235537p:plain