筋子からイクラの醤油漬けの作り方
調味液
- 濃く作った一番だし(3割)
- 醤油(5割)
- 日本酒(1割強)
- みりん(1割弱)
作り方のポイント
- 軽く煮切って、氷水で冷やして掃除が終わったいくらに合わせる。
- 調味液を入れる量は、いくらに対してひたひたからちょっと控えるぐらい。
Redshift ユーザーによって COPY/UNLOAD でアクセスできる S3 のフォルダ(プレフィックス)を分ける
Redshift ユーザーによって COPY/UNLOAD でアクセスできる S3 のフォルダ(プレフィックス)を分ける方法をメモ。
実現すること
- Redshift ユーザー "rs_user_a" は S3 パス "s3://s3-for-redshift/rs_user_a/" 以下のみに COPY/UNLOAD でアクセスできる。
- Redshift ユーザー "rs_user_b" は S3 パス "s3://s3-for-redshift/rs_user_b/" 以下のみに COPY/UNLOAD でアクセスできる。
検証結果
Redshift ユーザー "rs_user_a" は S3 パス "s3:///rs_user_a/" 以下のみに COPY/UNLOAD でアクセスできる。
- rs_user_a ユーザーで Redshift にログインする。
$ psql "host=redshift-cluster-poc-central.********.ap-northeast-1.redshift.amazonaws.com user=rs_user_a dbname=dev port=5439"
- 許可している S3 パスには COPY/UNLOAD できる。
dev=> unload ('select * from public.customer') dev-> to 's3://s3-for-redshift/rs_user_a/customer' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' dev-> header dev-> format csv dev-> gzip dev-> parallel off dev-> allowoverwrite; INFO: UNLOAD completed, 10125500 record(s) unloaded successfully. UNLOAD dev=> copy public.customer from 's3://s3-for-redshift/rs_user_a/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; INFO: Load into table 'customer' completed, 10125500 record(s) loaded successfully. COPY
- 許可していない S3 パスには COPY/UNLOAD できない。
dev=> unload ('select * from public.customer') dev-> to 's3://s3-for-redshift/rs_user_b/customer' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' dev-> header dev-> format csv dev-> gzip dev-> parallel off dev-> allowoverwrite; ERROR: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_a/RedshiftIamRoleSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_b/customer000.gz" because no identity-based polic DETAIL: ----------------------------------------------- error: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_a/RedshiftIamRoleSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_b/customer000.gz" because no identity-based polic code: 8001 context: Failed to initialize S3 output stream. S3 path: s3://s3-for-redshift/rs_user_b/customer000.gz query: 35041819 location: s3_text_unloader.cpp:369 process: padbmaster [pid=1073750164] ----------------------------------------------- dev=> copy public.customer from 's3://s3-for-redshift/rs_user_b/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; ERROR: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_a/RedshiftIamRoleSession is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_b/customer000.gz" because no identity-based polic DETAIL: ----------------------------------------------- error: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_a/RedshiftIamRoleSession is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_b/customer000.gz" because no identity-based polic code: 8001 context: S3 key being read : s3://s3-for-redshift/rs_user_b/customer000.gz query: 35482475 location: copy_s3_scanner.cpp:344 process: query1_252_35482475 [pid=15960] -----------------------------------------------
Redshift ユーザー "rs_user_b" は S3 パス "s3:///rs_user_b/" 以下のみに COPY/UNLOAD でアクセスできる。
- rs_user_b ユーザーで Redshift にログインする。
$ psql "host=redshift-cluster-poc-central.********.ap-northeast-1.redshift.amazonaws.com user=rs_user_b dbname=dev port=5439"
- 許可している S3 パスには COPY/UNLOAD できる。
dev=> unload ('select * from public.customer') dev-> to 's3://s3-for-redshift/rs_user_b/customer' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' dev-> header dev-> format csv dev-> gzip dev-> parallel off dev-> allowoverwrite; INFO: UNLOAD completed, 20251000 record(s) unloaded successfully. UNLOAD dev=> copy public.customer from 's3://s3-for-redshift/rs_user_b/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; INFO: Load into table 'customer' completed, 20251000 record(s) loaded successfully. COPY
- 許可していない S3 パスには COPY/UNLOAD できない。
dev=> unload ('select * from public.customer') dev-> to 's3://s3-for-redshift/rs_user_a/customer' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' dev-> header dev-> format csv dev-> gzip dev-> parallel off dev-> allowoverwrite; ERROR: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_b/RedshiftIamRoleSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_a/customer000.gz" because no identity-based polic DETAIL: ----------------------------------------------- error: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_b/RedshiftIamRoleSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_a/customer000.gz" because no identity-based polic code: 8001 context: Failed to initialize S3 output stream. S3 path: s3://s3-for-redshift/rs_user_a/customer000.gz query: 35483157 location: s3_text_unloader.cpp:369 process: padbmaster [pid=1073988131] ----------------------------------------------- dev=> copy public.customer from 's3://s3-for-redshift/rs_user_a/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; ERROR: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_b/RedshiftIamRoleSession is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_a/customer000.gz" because no identity-based polic DETAIL: ----------------------------------------------- error: S3ServiceException:User: arn:aws:sts::123456789012:assumed-role/iam-role_for_rs_role_b/RedshiftIamRoleSession is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::s3-for-redshift/rs_user_a/customer000.gz" because no identity-based polic code: 8001 context: S3 key being read : s3://s3-for-redshift/rs_user_a/customer000.gz query: 35483161 location: copy_s3_scanner.cpp:344 process: query1_252_35483161 [pid=15960] -----------------------------------------------
GRANT されていない IAM ロールには ASSUMEROLE できない
- rs_user_a
$ psql "host=redshift-cluster-poc-central.ceyg6jv96hfq.ap-northeast-1.redshift.amazonaws.com user=rs_user_a dbname=dev port=5439" dev=> copy public.customer from 's3://s3-for-redshift/rs_user_b/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; ERROR: User rs_user_a does not have ASSUMEROLE permission on IAM role "arn:aws:iam::123456789012:role/iam-role_for_rs_role_b" for COPY
- rs_user_b
$ psql "host=redshift-cluster-poc-central.ceyg6jv96hfq.ap-northeast-1.redshift.amazonaws.com user=rs_user_b dbname=dev port=5439" dev=> copy public.customer from 's3://s3-for-redshift/rs_user_a/customer000.gz' dev-> iam_role 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' dev-> format csv dev-> gzip compupdate off region 'ap-northeast-1' dev-> ignoreheader 1; ERROR: User rs_user_b does not have ASSUMEROLE permission on IAM role "arn:aws:iam::123456789012:role/iam-role_for_rs_role_a" for COPY
設定手順
- 対象の Redshift クラスターで以下を1度実行する。
- これを実行しないと grant されてない IAM ロールに AssumeRole できてしまう。
revoke assumerole on all from public for all;
ASSUMEROLE のための IAMロール "iam-role_for_rs_role_a" を作成する。
- iam-policy_for_rs_role_a
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::*" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::s3-for-redshift/rs_user_a/", "arn:aws:s3:::s3-for-redshift/rs_user_a/*" ] } ] }
- 信頼関係
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
ASSUMEROLE のための IAMロール "iam-role_for_rs_role_b" を作成する。
- iam-policy_for_rs_role_b
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::*" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::s3-for-redshift/rs_user_b/", "arn:aws:s3:::s3-for-redshift/rs_user_b/*" ] } ] }
- 信頼関係
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
- 作成した IAM ロールを Redshift にアタッチする。
- Redshift ユーザーを作成し、PUBLIC スキーマのテーブルへのアクセス権を付与する。
CREATE USER rs_user_a PASSWORD 'Password123!'; CREATE USER rs_user_b PASSWORD 'Password123!'; grant usage on schema public to rs_user_a, rs_user_b; grant all on all tables in schema public to rs_user_a, rs_user_b;
- Redshift ユーザーが ASSUMEROLE できるよう権限を GRANT する。
GRANT ASSUMEROLE ON 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_a' TO rs_user_a FOR COPY, UNLOAD; GRANT ASSUMEROLE ON 'arn:aws:iam::123456789012:role/iam-role_for_rs_role_b' TO rs_user_b FOR COPY, UNLOAD;
設定したロールを確認する
- rs_user_a
dev=# select pg_get_iam_role_by_user('rs_user_a'); pg_get_iam_role_by_user ------------------------------------------------------------------------------- (rs_user_a,all,None) (rs_user_a,arn:aws:iam::123456789012:role/iam-role_for_rs_role_a,COPY|UNLOAD) (2 rows)
- rs_user_b
dev=# select pg_get_iam_role_by_user('rs_user_b'); pg_get_iam_role_by_user ------------------------------------------------------------------------------- (rs_user_b,all,None) (rs_user_b,arn:aws:iam::123456789012:role/iam-role_for_rs_role_b,COPY|UNLOAD) (2 rows)
補足
- スーパーユーザーで以下を実行すると "revoke assumerole on all from public for all" を元に戻せる。また、"revoke assumerole on all from public for all" を手順の最後に実行するとそれまでは grant していない IAM ロールにも AssumeRole できる。
grant assumerole on all to public for all;
参考
ASSUMEROLE アクセス許可の付与
指定されたロールを持つユーザーおよびグループに付与される ASSUMEROLE アクセス許可の構文を次に示します。ASSUMEROLE 権限の使用を開始する際は、「ASSUMEROLE アクセス許可を付与するための使用上の注意事項」を参照してください。GRANT ASSUMEROLE ON { 'iam_role' [, ...] | default | ALL } TO { username | ROLE role_name | GROUP group_name | PUBLIC } [, ...] FOR { ALL | COPY | UNLOAD | EXTERNAL FUNCTION | CREATE MODEL } [, ...]GRANT - Amazon Redshift
Redshift で UNLOAD した時のファイル suffix の命名規則
Redshift で UNLOAD した時のファイル suffix の命名規則
ネーミングルール
- Parquet
- 000.parquet
検証手順
unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' csv parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' csv gzip parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' csv zstd parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' csv bzip2 parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' parquet parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' json gzip parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' json bzip2 parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' json zstd parallel off; unload ('select * from public.customer') to 's3://test-rs-copy-bucket/unload/customer_' allowoverwrite iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' json parallel off;
Redshift の COPY コマンドでロード時に無効な UTF-8 文字を自動的に置換する
COPY コマンドで Redshift にデータロード時に無効な UTF-8 文字がある場合、ACCEPTINVCHARS で固定の文字列に置換することができる。
ACCEPTINVCHARS [AS] ['replacement_char']
データに無効な UTF-8 文字がある場合でも、VARCHAR 列へのデータのロードを有効にします。ACCEPTINVCHARS を指定した場合、COPY は replacement_char で指定されている文字列から構成される同じ長さの文字列で、無効な各 UTF-8 文字を置き換えます。たとえば、置換文字が '^' である場合、無効な 3 バイト文字は '^^^' で置き換えられます。置換文字には NULL 以外の任意の ASCII 文字を使用できます。デフォルトは疑問符 (?) です。無効な UTF-8 文字の詳細については、「マルチバイト文字のロードエラー」を参照してください。
COPY は無効な UTF-8 文字を含んだ行の数を返し、対象行ごとに STL_REPLACEMENTS システムテーブルにエントリを追加します (各ノードスライスで最大 100 行まで)。さらに多くの無効な UTF-8 文字も置き換えられますが、それらの置換イベントは記録されません。
ACCEPTINVCHARS を指定しなかった場合、無効な UTF-8 文字があるごとに、COPY はエラーを返します。
ACCEPTINVCHARS は VARCHAR 列に対してのみ有効です。
データ変換パラメータ - Amazon Redshift
Redshift で COPY コマンド実行時に svl_query_metrics_summary や sys_query_history に情報が記録されるか
Redshift で COPY コマンド実行時に svl_query_metrics_summary や sys_query_history に情報が記録されることを確認したメモ。
結果
dev=# copy public.customer from 's3://test-rs-copy-bucket/csv/customer/' csv dev-# iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess'; INFO: Load into table 'customer' completed, 9000000 record(s) loaded successfully. COPY Time: 8538.877 ms (00:08.539) dev=# select pg_last_query_id(); pg_last_query_id ------------------ 33416134 (1 row) Time: 3.421 ms dev=# \gset Time: 3.452 ms dev=# \x Expanded display is on. dev=# select * from svl_query_metrics_summary where query = :pg_last_query_id; -[ RECORD 1 ]--------------+----------------------------------------------------------------- userid | 100 query | 33416134 service_class | 102 query_cpu_time | 14 query_blocks_read | 160 query_execution_time | 8 query_cpu_usage_percent | 40.07 query_temp_blocks_to_disk | segment_execution_time | 3 cpu_skew | 1.91 io_skew | 1.00 scan_row_count | 8850432 join_row_count | nested_loop_join_row_count | return_row_count | spectrum_scan_row_count | spectrum_scan_size_mb | query_queue_time | service_class_name | Default queue Time: 682.883 ms dev=# select * from sys_query_history where transaction_id = (select xid from stl_query where query = :pg_last_query_id); -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------ user_id | 100 query_id | 33416133 query_label | default transaction_id | 93744987 session_id | 1073889466 database_name | dev query_type | COPY status | success result_cache_hit | f start_time | 2024-09-06 00:04:43.957806 end_time | 2024-09-06 00:04:51.574099 elapsed_time | 7616293 queue_time | 0 execution_time | 7491272 error_message | returned_rows | 0 returned_bytes | 0 query_text | copy public.customer from 's3://test-rs-copy-bucket/csv/customer/' csv\niam_role ''; redshift_version | 1.0.73348 usage_limit | compute_type | primary compile_time | 6539232 planning_time | 0 lock_wait_time | 36 Time: 790.980 ms
手順
create table public.customer ( c_custkey integer not null, c_name varchar(25) not null, c_address varchar(25) not null, c_city varchar(10) not null, c_nation varchar(15) not null, c_region varchar(12) not null, c_phone varchar(15) not null, c_mktsegment varchar(10) not null ) diststyle even compound sortkey(c_nation,c_region); copy public.customer from 's3://test-rs-copy-bucket/csv/customer/' csv iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess'; select pg_last_query_id(); \gset \x select * from svl_query_metrics_summary where query = :pg_last_query_id; select * from sys_query_history where transaction_id = (select xid from stl_query where query = :pg_last_query_id);
Redshift の STL テーブルと SYS ビュー
Redshift の STL テーブルのクエリID(query列)と SYS ビューのクエリID(query_id)は別の値のため、クエリIDで結合することはできない。トランザクションID(STLとSYSのトランザクションIDが 1:1 の場合)で結合することができる。
例
dev=# \timing on dev=# select count(distinct(lo_orderkey)) from lineorder; count ----------- 150000000 (1 row) Time: 13282.862 ms (00:13.283) dev=# select pg_last_query_id(); pg_last_query_id ------------------ 33269125 (1 row) Time: 3.632 ms dev=# \gset Time: 3.703 ms dev=# \x Expanded display is on. dev=# select * from sys_query_history where transaction_id = (select xid from stl_query where query = :pg_last_query_id); -[ RECORD 1 ]----+---------------------------------------------------- user_id | 100 query_id | 33269123 query_label | default transaction_id | 93596628 session_id | 1073922183 database_name | dev query_type | SELECT status | success result_cache_hit | f start_time | 2024-09-05 04:25:48.409507 end_time | 2024-09-05 04:26:01.689166 elapsed_time | 13279659 queue_time | 0 execution_time | 13263519 error_message | returned_rows | 1 returned_bytes | 15 query_text | select count(distinct(lo_orderkey)) from lineorder; redshift_version | 1.0.73348 usage_limit | compute_type | primary compile_time | 155 planning_time | 6009 lock_wait_time | 22 Time: 378.407 ms
参考
SYS monitoring views such as such as SYS_QUERY_HISTORY and SYS_QUERY_DETAIL contain the query_id column, which holds the identifier for users’ queries. Similarly, provisioned-only views such as STL_QUERY and SVL_QLOG contain the query column, which also holds the query identifiers. However, the query identifiers recorded in the SYS system views are different from those recorded in the provisioned-only views.
The difference between the SYS views’ query_id column values and the provisioned-only views’ query column values is as follows:
In SYS views, the query_id column records user-submitted queries in their original form. The Amazon Redshift optimizer might break them down into child queries for improved performance, but a single query you run will still only have a single row in SYS_QUERY_HISTORY. If you want to see the individual child queries, you can find them in SYS_QUERY_DETAIL.
In provisioned-only views, the query column records queries at the child query level. If the Amazon Redshift optimizer rewrites your original query into multiple child queries, there will be multiple rows in STL_QUERY with differing query identifier values for a single query you run.
When you migrate your monitoring and diagnostic queries from provisioned-only views to SYS views, consider this difference and edit your queries accordingly. For more information on how Amazon Redshift processes queries, see Query planning and execution workflow.
System tables and views reference - Amazon Redshift