2021-10-27

CloudWatch アラームから SNS への通知で KMS から AccessDeniedException と怒られる

AWS

事象

CloudWatch アラームから SNS への通知で KMS から AccessDeniedException と怒られる

アクション arn:aws:sns:ap-northeast-1:123456789012:cloudwatch-test-sns の実行に失敗しました。エラーが発生しました: "null (Service: AWSKMS; Status Code: 400; Error Code: AccessDeniedException; Request ID: c97514c0-163d-4458-b18c-990a4a83214f; Proxy: null)"

f:id:yohei-a:20211027080553p:plain

原因

SNS トピックの暗号化で alias/aws/sns を指定すると、CloudWatch から SNS トピックへの書込みができない。
- CloudWatch アラームは「kms:Decrypt」と「kms:GenerateDataKey」権限が必要。
SNS のデフォルトの AWS KMS キーはマネージドキーのためキーポリシーを編集できない。

f:id:yohei-a:20211027075851p:plain

解決策

KMS でカスタマー管理型キーを作成し、キーポリシーで CloudWatch からのアクセスを許可する。

{
    "Version": "2012-10-17",
    "Id": "key-consolepolicy-3",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudwatch.amazonaws.com"
            },
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey*"
            ],
            "Resource": "*"
        }
    ]
}

作成したカスタマー管理型キーを SNS トピックで暗号化キーに指定する。

f:id:yohei-a:20211027075721p:plain

参考

SNS トピックの暗号化が原因で、トリガーアクションが失敗している場合:

CloudWatch アラーム履歴に、次のようなメッセージが表示されます。

アクション arn:aws:sns: : : を実行できませんでした。受信したエラー: 「null (サービス: AWSKMS; ステータスコード: 400; エラーコード: AccessDeniedException;)」

SNS では、トピックの保管時の暗号化が許可されています。デフォルトの AWS Key Management Service (KMS) キー「alias/aws/sns」がこの暗号化に使用されている場合、CloudWatch アラームはメッセージを SNS トピックに発行できません。SNS のデフォルトの AWS KMS キーのキーポリシーでは、CloudWatch アラームが「kms:Decrypt」と「kms:GenerateDataKey」API 呼び出しを実行することを許可していません。このキーは AWS マネージドであるため、ポリシーを手動で編集することはできません。

SNS トピックを保存時に暗号化する必要がある場合は、カスタマーマネージドキーを使用できます。カスタマーマネージドキーには、キーポリシーのステートメントセクションに次の許可が含まれている必要があります。この許可により、CloudWatch アラームが、暗号化された SNS トピックにメッセージを発行できるようになります。
{
    "Sid": "Allow_CloudWatch_for_CMK",
    "Effect": "Allow",
    "Principal": {
        "Service":[
            "cloudwatch.amazonaws.com"
        ]
    },
    "Action": [
        "kms:Decrypt","kms:GenerateDataKey*"
    ],
    "Resource": "*"
}
CloudWatch アラームトリガーの SNS 通知を受信する際の問題を解決する

2021-10-26

Amazon Redshift でクエリの単体性能を計測するスクリプト

AWS

yoheia/aws/redshift/redshift_measuring_query_exec_time at master · yoheia/yoheia · GitHub

2021-10-22

Amazon Redshift のクエリモニタリングルールで一時ディスク使用量の多いクエリーを中止する

Amazon Redshift の WLM クエリモニタリングルールで、一時ディスクを1MB以上使っているクエリを中止してみた。
以下の例では 1MB 以一時ディスクを使用したクエリを中止している。

実行結果

[ec2-user@ip-172-31-0-222 ~]$ export LC_ALL=C
[ec2-user@ip-172-31-0-222 ~]$ psql "host=redshift-cluster-2.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"
Password for user awsuser:
psql (13.4, server 8.0.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

dev=# \timing
Timing is on.
dev=# set enable_result_cache_for_session=off;
SET
Time: 3.760 ms

dev=# select lo_shipmode, 
	lo_tax, 
	lo_supplycost, 
	lo_revenue, 
	lo_discount, 
	lo_orderpriority, 
	lo_orderdate,count(*) 
from lineorder 
group by lo_shipmode, 
	lo_tax, 
	lo_supplycost, 
	lo_revenue, 
	lo_discount, 
	lo_orderpriority, 
	lo_orderdate
order by lo_shipmode, 
	lo_tax, lo_supplycost, 
	lo_revenue, 
	lo_discount, 
	lo_orderpriority, 
	lo_orderdate;
ERROR:  Query (1586) cancelled by WLM abort action of Query Monitoring Rule "cancel_heavy_temp_usage".
DETAIL:
  -----------------------------------------------
  error:  Query (1586) cancelled by WLM abort action of Query Monitoring Rule "cancel_heavy_temp_usage".
  code:      1078
  context:   Query (1586) cancelled by WLM abort action of Query Monitoring Rule "cancel_heavy_temp_usage".
  query:     0
  location:  wlm_query_action.cpp:156
  process:   wlm [pid=15087]
  -----------------------------------------------

Time: 30081.170 ms (00:30.081)
dev=#

設定手順

ワークロード管理でパラメータグループを作成

マネジメントコンソールから [Amazon Redshift]-[設定]-[ワークロード管理]を選択
[パラメータグループ]-[作成] をクリックして、パラメータグループを作成
- パラメータ名: qmr-heavy-temp-usage
- 説明: Cancel queries consume large temp storage
[ワークロード管理]-[ワークロードキューを編集]をクリック
[クエリモニタリングルール]-[カスタムルールを追加]をクリック
- ルール名: cancel_heavy_temp_usage
- 述語: ディスクへのメモリ (1 MB ブロック) > 1
- アクション: 中止
保存した WLM 設定パラメータの JSON 形式

f:id:yohei-a:20211022170512p:plain

[
  {
    "auto_wlm": true,
    "user_group": [],
    "query_group": [],
    "name": "Default queue",
    "rules": [
      {
        "rule_name": "cancel_heavy_temp_usage",
        "predicate": [
          {
            "metric_name": "query_temp_blocks_to_disk",
            "operator": ">",
            "value": 1
          }
        ],
        "action": "abort"
      }
    ]
  },
  {
    "short_query_queue": true
  }
]

作成したパラメータグループを Redshift クラスターにアタッチする

マネジメントコンソールから [Amazon Redshift]-[任意のクラスター]-[プロパティ]-[データベース設定]-[編集]-[パラメータグループを編集]で、作成したパラメータグループ "qmr-heavy-temp-usage" を選択する

f:id:yohei-a:20211019133559p:plain

[設定]-[ワークロード管理]-[任意のパラメータグループ]-[パラメータ]-[アタッチされたクラスター] で再起動しないと反映されないパラメータを確認することができる

f:id:yohei-a:20211022171306p:plain

クラスターを再起動する（QMRは再起動しなくても反映される）

環境

Amazon Redshift クラスター
- dc2.large
- 1 ノード
EC2
- amzn2-ami-hvm-2.0.20210721.2-x86_64-gp2
- Amazon Linux 2
- psql (PostgreSQL) 13.4

参考

WLM クエリモニタリングルール - Amazon Redshift

2021-10-20

PostgreSQL の timestamp without timezone の最大値

PostgreSQL

PostgreSQL 12.4 の timestamp without timezone の最大値は AD 294276 年。

$ psql "host=aurora-postgres124.cluster-********.ap-northeast-1.rds.amazonaws.com user=awsuser dbname=postgres port=5432"
psql (13.4, server 12.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> create table timezone_test (id varchar(10), ts_wo_tz timestamp without time zone);
CREATE TABLE
postgres=>
postgres=> insert into timezone_test(id, ts_wo_tz) values ('1', '209999-08-16 00:00:00.000000');
INSERT 0 1
postgres=> select * from timezone_test;
 id |       ts_wo_tz
----+-----------------------
 1  | 209999-08-16 00:00:00
(1 row)

postgres=> insert into timezone_test(id, ts_wo_tz) values ('2', '294276-12-31 00:00:00.000000');
INSERT 0 1
postgres=> insert into timezone_test(id, ts_wo_tz) values ('3', '294277-12-31 00:00:00.000000');
ERROR:  timestamp out of range: "294277-12-31 00:00:00.000000"
LINE 1: ...ert into timezone_test(id, ts_wo_tz) values ('3', '294277-12...
                                                             ^
postgres=> select * from timezone_test;
 id |       ts_wo_tz
----+-----------------------
 1  | 209999-08-16 00:00:00
 2  | 294276-12-31 00:00:00
(2 rows)

環境

Aurora PostgreSQL 12.4

参考

表8.9 日付/時刻データ型

型名格納サイズ説明最遠の過去最遠の未来精度

timestamp [ (p) ] [ without time zone ] 8 バイト日付と時刻両方（時間帯なし） 4713 BC 294276 AD 1マイクロ秒

timestamp [ (p) ] with time zone 8バイト日付と時刻両方、時間帯付き 4713 BC 294276AD 1マイクロ秒

8.5. 日付/時刻データ型

型名	格納サイズ	説明	最遠の過去	最遠の未来	精度
timestamp [ (p) ] [ without time zone ]	8 バイト	日付と時刻両方（時間帯なし）	4713 BC	294276 AD	1マイクロ秒
timestamp [ (p) ] with time zone	8バイト	日付と時刻両方、時間帯付き	4713 BC	294276AD	1マイクロ秒

2021-10-19

Amazon Redshift のクエリモニタリングルールで長時間実行クエリーを中止する

Redshift AWS

Amazon Redshift の WLM クエリモニタリングルールで、一定時間以上実行されているクエリを中止してみた。
以下の例では 10 秒以上要したクエリを中止している。

実行結果

[ec2-user@ip-172-31-0-222 ~]$ export LC_ALL=C
[ec2-user@ip-172-31-0-222 ~]$ cat lineorder.sql
select count(a.*) from lineorder a, lineorder b, lineorder c;
[ec2-user@ip-172-31-0-222 ~]$ psql "host=redshift-cluster-4.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"
psql (13.4, server 8.0.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

dev=# \timing
Timing is on.
dev=# \i lineorder.sql
psql:lineorder.sql:1: ERROR:  Query (5591) cancelled by WLM abort action of Query Monitoring Rule "cancel_slow_query".
DETAIL:
  -----------------------------------------------
  error:  Query (5591) cancelled by WLM abort action of Query Monitoring Rule "cancel_slow_query".
  code:      1078
  context:   Query (5591) cancelled by WLM abort action of Query Monitoring Rule "cancel_slow_query".
  query:     0
  location:  wlm_query_action.cpp:156
  process:   wlm [pid=3920]
  -----------------------------------------------

Time: 12192.098 ms (00:12.192)

f:id:yohei-a:20211019140351p:plain

設定手順

ワークロード管理でパラメータグループを作成

マネジメントコンソールから [Amazon Redshift]-[設定]-[ワークロード管理]を選択
[パラメータグループ]-[作成] をクリックして、パラメータグループを作成
- パラメータ名: qmr-cancel-slow-query
- 説明: Cancel slow query
[ワークロード管理]-[ワークロードキューを編集]をクリック
[クエリモニタリングルール]-[カスタムルールを追加]をクリック
- ルール名: cancel_slow_query
- 述語: クエリ実行時間（秒） > 10
- アクション: 中止

f:id:yohei-a:20211019133729p:plain

作成したパラメータグループを Redshift クラスターにアタッチする

マネジメントコンソールから [Amazon Redshift]-[任意のクラスター]-[プロパティ]-[データベース設定]-[編集]-[パラメータグループを編集]で、作成したパラメータグループ "qmr-cancel-slow-query" を選択する

f:id:yohei-a:20211019133559p:plain

[設定]-[ワークロード管理]-[任意のパラメータグループ]-[パラメータ]-[アタッチされたクラスター] で再起動しないと反映されないパラメータを確認することができる

f:id:yohei-a:20211019134240p:plain

クラスターを再起動する（QMRは再起動しなくても反映される）

環境

Amazon Redshift クラスター
- ra3.4xlarge
- 2 ノード
EC2
- amzn2-ami-hvm-2.0.20210721.2-x86_64-gp2
- Amazon Linux 2
- psql (PostgreSQL) 13.4

参考

WLM クエリモニタリングルール - Amazon Redshift

2021-10-11

PostgreSQL のシーケンスのキャッシュサイズによる tps を比較

PostgreSQL

PostgreSQL のシーケンスのキャッシュサイズによる tps を比較してみた。pgbench で同時多重（500）でシーケンスにアクセスし、キャッシュサイズが 1 だと 9,601、20 だと 12,714 とキャッシュサイズが大きいほうがスループットは高い。とりあえず流してみた程度で厳密な検証ではない。

#	キャッシュサイズ	tps
1	1	9,601
2	20	12,714

RDS Performance Insights

f:id:yohei-a:20211011061719p:plain

RDS CloudWatch メトリクス

f:id:yohei-a:20211011061733p:plain

EC2 CloudWatch メトリクス（pgbench を実行）

f:id:yohei-a:20211011062658p:plain

計測手順

キャッシュサイズ=1

$ pgbench -r -c 500 -j 500 -n -t 1000 -f pgsqlseq1_1.sql -U awsuser -h aurora-postgres124.cluster-cnaamhj3erpx.ap-northeast-1.rds.amazonaws.com -d postgres -p 5432 >  pgsqlseq1_1.log 2>&1

（中略）

transaction type: pgsqlseq1_1.sql
scaling factor: 1
query mode: simple
number of clients: 500
number of threads: 500
number of transactions per client: 1000
number of transactions actually processed: 500000/500000
latency average = 81.927 ms
tps = 6103.021764 (including connections establishing)
tps = 9601.273317 (excluding connections establishing)
statement latencies in milliseconds:
        23.610  select nextval('pgsqlseq1_1');

キャッシュサイズ=20

$ pgbench -r -c 500 -j 500 -n -t 1000 -f pgsqlseq1_20.sql -U awsuser -h aurora-postgres124.cluster-cnaamhj3erpx.ap-northeast-1.rds.amazonaws.com -d postgres -p 5432 >  pgsqlseq1_20.log 2>&1

（中略）

transaction type: pgsqlseq1_20.sql
scaling factor: 1
query mode: simple
number of clients: 500
number of threads: 500
number of transactions per client: 1000
number of transactions actually processed: 500000/500000
latency average = 64.212 ms
tps = 7786.661545 (including connections establishing)
tps = 12714.076957 (excluding connections establishing)
statement latencies in milliseconds:
        15.243  select nextval('pgsqlseq1_20');

準備手順

シーケンスを作成する

create sequence pgsqlseq1_1 start with 1 increment by 1 cache 1;
create sequence pgsqlseq1_20 start with 1 increment by 1 cache 20;

pgbench から実行する SQL スクリプトを作成する

# pgsqlseq1_1.sql
select nextval('pgsqlseq1_1');
# pgsqlseq1_20.sql
select nextval('pgsqlseq1_20');

環境

PostgreSQL
- Aurora PostgreSQL 12.4
- r5.4xlarge
EC2
- Amazon Linux 2 (amzn2-ami-hvm-2.0.20210721.2-x86_64-gp2)
- c5.4xlarge
- pgbench 13.4

参考

OracleとPostgreSQLのシーケンスキャッシュの動作差異 | my opinion is my own

2021-10-03

Invalid Action: The action s3:CreateMultipartUpload does not exist

AWS

事象

S3 バケットポリシーのアクションで s3:CreateMultipartUpload を許可すると "Invalid Action: The action s3:CreateMultipartUpload does not exist. " というエラーが出る。

原因

Create Multipart Upload に必要な権限は s3:PutObject。 s3:CreateMultipartUpload という権限はない。

Action Required permissions

Create Multipart Upload You must be allowed to perform the s3:PutObject action on an object to create multipart upload.The bucket owner can allow other principals to perform the s3:PutObject action.

Uploading and copying objects using multipart upload - Amazon Simple Storage Service

ablog

不器用で落着きのない技術者のメモ

CloudWatch アラームから SNS への通知で KMS から AccessDeniedException と怒られる

事象

原因

解決策

参考

Amazon Redshift でクエリの単体性能を計測するスクリプト

Amazon Redshift のクエリモニタリングルールで一時ディスク使用量の多いクエリーを中止する

実行結果

設定手順

ワークロード管理でパラメータグループを作成

作成したパラメータグループを Redshift クラスターにアタッチする

環境

参考

PostgreSQL の timestamp without timezone の最大値

環境

参考

表8.9 日付/時刻データ型

Amazon Redshift のクエリモニタリングルールで長時間実行クエリーを中止する

実行結果

設定手順

ワークロード管理でパラメータグループを作成

作成したパラメータグループを Redshift クラスターにアタッチする

環境

参考

PostgreSQL のシーケンスのキャッシュサイズによる tps を比較

計測手順

準備手順

環境

参考

Invalid Action: The action s3:CreateMultipartUpload does not exist

事象

原因