2022-05-29

Redshift でテーブルにアクセス権があるのに "permission denied for schema ..." と怒られる

AWS AWS

事象

ユーザーにテーブルに対する権限を付与しているのに "permission denied for schema ..." と怒られる。

こんな感じでアクセス権を付与しているが、

% psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"    
create group adhoc_l1_group;
create group adhoc_l2_group;
grant all on all tables in schema public, awssampledb to group adhoc_l1_group; ★ テーブルに対して権限を付与しているが
grant all on all tables in schema public, awssampledb to group adhoc_l2_group;
create user adhoc_l1_user password 'Password1' in group adhoc_l1_group session timeout 36000;
create user adhoc_l2_user password 'Password1' in group adhoc_l2_group session timeout 36000;

"ERROR: permission denied for schema" と怒られる。

% psql "host=redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com user=adhoc_l1_user dbname=dev port=5439"

dev=> select count(*) from awssampledb.lineorder;
ERROR:  permission denied for schema awssampledb ★アクセス権がないと怒られる

原因

スキーマに対する usage 権限が付与されていないため。

% psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"    
dev=# select
  usename
  , schemaname
  , tablename
  , has_table_privilege(usename, schemaname || '.' || tablename, 'select') as select
  , has_table_privilege(usename, schemaname || '.' || tablename, 'insert') as insert
  , has_table_privilege(usename, schemaname || '.' || tablename, 'update') as update
  , has_table_privilege(usename, schemaname || '.' || tablename, 'delete') as delete
  , has_table_privilege(usename, schemaname || '.' || tablename, 'references') as references
  , has_schema_privilege(usename, schemaname, 'usage') as usage
from
  pg_tables, pg_user
where
  schemaname in ('awssampledb')  -- 確認したいスキーマで絞り込む
and
  usename in ('adhoc_l1_user','adhoc_l2_user')  -- 確認したいユーザで絞り込む
order by
  1, 2, 3
;
    usename    | schemaname  | tablename | select | insert | update | delete | references | usage 
---------------+-------------+-----------+--------+--------+--------+--------+------------+-------
 adhoc_l1_user | awssampledb | customer  | t      | t      | t      | t      | t          | f
 adhoc_l1_user | awssampledb | dwdate    | t      | t      | t      | t      | t          | f
 adhoc_l1_user | awssampledb | lineorder | t      | t      | t      | t      | t          | f
 adhoc_l1_user | awssampledb | part      | t      | t      | t      | t      | t          | f
 adhoc_l1_user | awssampledb | supplier  | t      | t      | t      | t      | t          | f
 adhoc_l2_user | awssampledb | customer  | t      | t      | t      | t      | t          | f
 adhoc_l2_user | awssampledb | dwdate    | t      | t      | t      | t      | t          | f
 adhoc_l2_user | awssampledb | lineorder | t      | t      | t      | t      | t          | f
 adhoc_l2_user | awssampledb | part      | t      | t      | t      | t      | t          | f
 adhoc_l2_user | awssampledb | supplier  | t      | t      | t      | t      | t          | f
(10 rows)

対処法

スキーマに対する usage 権限を付与する

% psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"    
dev=# grant usage on schema awssampledb to adhoc_l1_user, adhoc_l2_user;

権限が付与されていることを確認する

dev=# select
  usename
  , schemaname
  , tablename
  , has_table_privilege(usename, schemaname || '.' || tablename, 'select') as select
  , has_table_privilege(usename, schemaname || '.' || tablename, 'insert') as insert
  , has_table_privilege(usename, schemaname || '.' || tablename, 'update') as update
  , has_table_privilege(usename, schemaname || '.' || tablename, 'delete') as delete
  , has_table_privilege(usename, schemaname || '.' || tablename, 'references') as references
  , has_schema_privilege(usename, schemaname, 'usage') as usage
from
  pg_tables, pg_user
where
  schemaname in ('awssampledb')  -- 確認したいスキーマで絞り込む
and
  usename in ('adhoc_l1_user','adhoc_l2_user')  -- 確認したいユーザで絞り込む
order by
  1, 2, 3
;
    usename    | schemaname  | tablename | select | insert | update | delete | references | usage 
---------------+-------------+-----------+--------+--------+--------+--------+------------+-------
 adhoc_l1_user | awssampledb | customer  | t      | t      | t      | t      | t          | t
 adhoc_l1_user | awssampledb | dwdate    | t      | t      | t      | t      | t          | t
 adhoc_l1_user | awssampledb | lineorder | t      | t      | t      | t      | t          | t
 adhoc_l1_user | awssampledb | part      | t      | t      | t      | t      | t          | t
 adhoc_l1_user | awssampledb | supplier  | t      | t      | t      | t      | t          | t
 adhoc_l2_user | awssampledb | customer  | t      | t      | t      | t      | t          | t
 adhoc_l2_user | awssampledb | dwdate    | t      | t      | t      | t      | t          | t
 adhoc_l2_user | awssampledb | lineorder | t      | t      | t      | t      | t          | t
 adhoc_l2_user | awssampledb | part      | t      | t      | t      | t      | t          | t
 adhoc_l2_user | awssampledb | supplier  | t      | t      | t      | t      | t          | t
(10 rows)

アクセスできるようになっている

% psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=adhoc_l1_user dbname=dev port=5439"

dev=> select count(*) from awssampledb.lineorder;
   count   
-----------
 600037902
(1 row)

参考

2022-05-29

よく使う pgbench のオプション

ベンチマーク用オプション

-t: 指定した回数トランザクションを実行する（回数を指定）
-T: 指定した秒数トランザクションを実行する（期間を指定）
-c: データベースセッション数
-j: ワーカースレッド数
-f <ファイル名>: 実行するSQL スクリプトファイル名
-r: コマンド毎のレイテンシーをレポート出力する
-n: クエリを実行する前に VACUUM しない。

参考

2022-05-29

よく使う psql のオプション

-f: SQL スクリプトファイルを指定して実行する
-a: 実行した SQL も出力する

実行例

psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439" -a -f sample.sql

参考

psql

2022-05-29

macOS に pgbench をインストールする

% brew install postgres

参考

Postgres on Mac with Brew 💻. Brewing PostgreSQL on OS X | by Gopi Krishna Kancharla | Think Special — Gopi K Kancharla | Medium | Think Special — Gopi K Kancharla

2022-05-29

Redshift でクエリを同時多重実行検証をするスクリプト

TL;DR

Redshift クラスターのサイジングや Concurrency Scaling の検証などで、クエリを同時実行し、クエリ単体のレスポンス要件と複数クエリのスループット要件の両方を満たしているかを検証したりする bash スクリプト。bash スクリプトから pgbench を読んでいる。Amazon Linux on EC2 に pgbench をインストールして、bash スクリプトを置けば実行できる。

セットアップ

EC2

EC2 インスタンス（m5.large、Amazon Linux 2）を作成する
pgbench のインストール
- Amazon Linux 2 に PostgreSQL 13 をインストールする - ablog
スクリプトのダウンロード

$ git clone git@github.com:yoheia/redshift_query_executer.git
$ git clone git@github.com:yoheia/redshift_concurrent_query_executer.git
$ git clone git@github.com:yoheia/redshift_poc_scenario_scripts.git

~/.pgpass のを設定する

$ vi ~/.pgpass
redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com:5439:dev:awsuser:Password1
redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com:5439:dev:dwh_batch_user:Password1
redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com:5439:dev:dwh_tool_user:Password1
redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com:5439:dev:adhoc_l1_user:Password1
redshift-cluster-poc.*******.ap-northeast-1.redshift.amazonaws.com:5439:dev:adhoc_l2_user:Password1
$ chmod 600 ~/.pgpass

Redshift

クラスター作成
- クラスター識別子: redshift-cluster-poc
- ノードの種類: ra3.3xlarge
- ノードの数: 2
- IAMロールを関連づける: S3 に読み書きできる権限を持つIAMロールをアタッチ
セットアップスクリプトのディレクトリに移動する

$ cd redshift_concurrent_query_executer/setup/

グループ・ユーザー作成

$ chmod u+x create_groups_users.sh
$ ./create_groups_users.sh

パラメーターグループとWLM設定

$ chmod u+x redshift_create_parameter-group.sh
$ ./redshift_create_parameter-group.sh

サンプルテーブルを作成してデータをロードする（~/.pgpass を設定した上で実行する）

$ nohup ./exec_all_ddl_copy.sh &

実行

実行する

$ cd redshift_concurrent_query_executer/executer/
$ ./redshift_poc_senario1.sh

検証結果の確認

2022-05-28

Redshift でワークロードキュー毎のクエリ同時実行数を確認する

AWS

クラスター全体での実行中とキュー待ちのクエリ数

マネジメントコンソールで[Redshift]-[クラスターを選択]-[クエリのモニタリング]-[ワークロードの同時実行]
- オレンジ色：実行中のクエリ数
- 青色：ワークロードキュー待ちのクエリ数

「キューに入れられたクエリとクラスターでのクエリの実行」に表示される、オレンジ色と青色のバーが見て取れます。
オレンジ色が実行中のクエリで、青色がキューに入っているもののそのまま動いていない、云わば「待たされている」クエリです。つまり、青色が多いということはキューが詰まっているということになります。
Manual WLM モードの Redshift クラスターで Concurrency Scaling 有効化や同時実行数のチューニングを行った運用知見と注意点について - サーバーワークスエンジニアブログ

ワークロードキュー別の実行中とキュー待ちのクエリ数

[CloudWatch メトリクス]-[Redshift]-[ClusterIdentifier,wlmid]
- WLMRunningQueries：WLM毎のクエリ同時実行数
- WLMQueueWaitTime：WLM毎のキューでの待ち時間

左軸に、WLMRunningQueries として実行中のクエリがカウントされていますが、これが「メインでの同時実行」＝「13」という設定で頭打ちしていることがわかります。

Queue 2 WLMQueueWaitTime を右軸に追加して確認してみると、待ち時間が増え続けていることがわかります。つまり、BI ツールの処理は「クエリが流れるよりもクエリが増える速度の方が早く Queue が積み上がり続けている」状態になってしまっています。
Manual WLM モードの Redshift クラスターで Concurrency Scaling 有効化や同時実行数のチューニングを行った運用知見と注意点について - サーバーワークスエンジニアブログ

同時実行スケーリング(Concurrency Scaling) で追加されたクラスター数の確認

[CloudWatchメトリクス]-[Redshift]-[クラスター別の集計]-[ConcurrencyScalingActiveClusters]

以下のメトリクスで同時実行スケーリングの状態が確認できます。

ConcurrencyScalingActiveClusters

Manual WLM モードの Redshift クラスターで Concurrency Scaling 有効化や同時実行数のチューニングを行った運用知見と注意点について - サーバーワークスエンジニアブログ

手動WLMでは1クラスターでの最大クエリ同時実行数は50

ワークロード管理において、各 Queue で同時実行可能な最大数を事前に規定します。
それぞれの Queue ごとに設定可能な最大数は 50 で、またそれぞれの Queue の同時実行数合計も 50 までである必要があります。

手動 WLM を使用した場合、Amazon Redshift は、同時実行レベルが 5 (最大 5 個のクエリを同時実行) のキューを 1 つと、同時実行レベルが 1 の定義済みスーパーユーザーキューを 1 つ設定します。最大で 8 個のキューを定義できます。各キューの同時実行レベルは最大で 50 に設定できます。すべてのユーザー定義キュー (スーパーユーザーキューは含みません) の同時実行レベルの合計の最大値は 50 です。
ワークロード管理 - Amazon Redshift
Manual WLM モードの Redshift クラスターで Concurrency Scaling 有効化や同時実行数のチューニングを行った運用知見と注意点について - サーバーワークスエンジニアブログ

参考

WLMRunningQueries
WLM キューごとのメインクラスターと同時実行スケーリングクラスターの両方から実行されているクエリの数。
単位: 個
ディメンション: ClusterIdentifier, wlmid
ディメンション: ClusterIdentifier、QueueName
CloudWatch メトリクスを使用した Amazon Redshift のモニタリング - Amazon Redshift

2022-05-27

psql on EC2 で Redshift のクエリの結果セットが大きい場合のボトルネックを調べる

AWS

サマリー

EC2インスタンス（t2.micro）から psql で Redshift に結果セットが100GB以上と大きい select 文を発行すると、CloudWatch メトリクスでネットワーク受信量 (バイト数)が約500MB/分になる。 m5.large では約3.2GB/分のスループットが出た。

t2.micro（帯域：最大 5 Gbps=625MB/s）
- 約500MB/分

m5.large（帯域：最大 10 Gbps=1.25GB/s）
- 約3.2GB/分

検証手順

psql で Amazon Redshift に接続する。

$ export LC_ALL=C
$ psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439"

スキーマを作成する

CREATE SCHEMA IF NOT EXISTS poc_sample;

検証用にテーブルを作成する

CREATE TABLE poc_sample.lineorder 
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);
CREATE TABLE poc_sample.lineorder_10x
(
  lo_orderkey          INTEGER NOT NULL,
  lo_linenumber        INTEGER NOT NULL,
  lo_custkey           INTEGER NOT NULL,
  lo_partkey           INTEGER NOT NULL,
  lo_suppkey           INTEGER NOT NULL,
  lo_orderdate         INTEGER NOT NULL,
  lo_orderpriority     VARCHAR(15) NOT NULL,
  lo_shippriority      VARCHAR(1) NOT NULL,
  lo_quantity          INTEGER NOT NULL,
  lo_extendedprice     INTEGER NOT NULL,
  lo_ordertotalprice   INTEGER NOT NULL,
  lo_discount          INTEGER NOT NULL,
  lo_revenue           INTEGER NOT NULL,
  lo_supplycost        INTEGER NOT NULL,
  lo_tax               INTEGER NOT NULL,
  lo_commitdate        INTEGER NOT NULL,
  lo_shipmode          VARCHAR(10) NOT NULL
);

データをロードする
- 事前に s3://awssampledb-yoheia/ssbgz/(ap-northeast-1) に s3://awssampledbuswest2/ssbgz/(us-west-2) のデータをコピーしている。

copy poc_sample.lineorder from 's3://awssampledb-yoheia/ssbgz/lineorder' 
iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess'
gzip compupdate off region 'ap-northeast-1';

データを10倍に増幅する。

insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);
insert into poc_sample.lineorder_10x (select * from poc_sample.lineorder);

S3 から Redshift にデータロードする。

select  database, schema, "table", size, tbl_rows from svv_table_info where schema = 'poc_sample';

 database |   schema   |     table     |  size  |  tbl_rows  
----------+------------+---------------+--------+------------
 dev      | poc_sample | lineorder     |  25112★25GB |  600037902
 dev      | poc_sample | lineorder_10x | 250597★250GB | 6000379020
(2 rows)

SQL スクリプトを作成する。

$ cat lineorder.sql 
--\set FETCH_COUNT 10000
select * from poc_sample.lineorder_10x;

psql on EC2 から大きな結果セットを取得する。

psql "host=redshift-cluster-poc.********.ap-northeast-1.redshift.amazonaws.com user=awsuser dbname=dev port=5439" -f lineorder.sql

性能系ツールのインストール

sudo yum -y install git gcc-c++ libpcap-devel.x86_64 libpcap.x86_64 ncurses*
sudo yum -y install htop dstat perf
git clone https://github.com/raboof/nethogs
cd nethogs
make
sudo make install

参考

NetworkIn
すべてのネットワークインターフェイスを通じ、このインスタンスによって受信されたバイトの数。このメトリクスは、1 つのインスタンスへの受信ネットワークトラフィックの量を表しています。
報告された数は、期間中に受信されたバイト数です。基本 (5 分) のモニタリングで統計情報に Sum 使用している場合であれば、この数を 300 で除算してバイト/秒の値を求めることができます。詳細 (1 分) のモニタリングで統計情報に Sum 使用している場合合は、この数を 60 で除算します。
単位: バイト
インスタンスの利用可能な CloudWatch メトリクスのリスト表示 - Amazon Elastic Compute Cloud

事象

原因

対処法

参考

ベンチマーク用オプション

参考

よく使う psql のオプション

実行例

参考

参考

TL;DR

セットアップ

EC2

Redshift

実行

検証結果の確認

クラスター全体での実行中とキュー待ちのクエリ数

ワークロードキュー別の実行中とキュー待ちのクエリ数

同時実行スケーリング(Concurrency Scaling) で追加されたクラスター数の確認

手動WLMでは1クラスターでの最大クエリ同時実行数は50

参考

サマリー

検証手順

関連

参考