Amazon Redshift で結合キーに関数を使うとソートマージ結合にならないパターン。
検証結果
結合キーに collate 関数を使用
- クライアントからみた実行時間は25分47秒。
select a.l_linenumber, avg(a.l_quantity) from lineitem a, lineitem b where collate(a.l_orderkey, 'case_insensitive') = collate(b.l_orderkey, 'case_insensitive') and a.l_shipdate = b.l_shipdate group by a.l_linenumber order by a.l_linenumber; l_linenumber | avg --------------+------- 1 | 25.49 2 | 25.49 3 | 25.49 4 | 25.50 5 | 25.50 6 | 25.49 7 | 25.50 (7 rows) Time: 1547123.222 ms (25:47.123)
- クエリIDを取得する
select pg_last_query_id();
\gset
- SVL_QUERY_SUMMARY でボトルネックを確認
- maxtime より Segment 0 で約9分(538秒)、Segment 1で約9分(552秒)、Segment 3-4 で約17分(100秒)要している。
- Segment 1 で、ハッシュ結合でワークメモリを約24GB(workmem=26,732,396,544 byte)使ってソートし、メモリ上だけでは処理できず、ストレージに中間結果を書き出している(is_diskbased=t)。
select * from svl_query_summary where query = :pg_last_query_id order by stm, seg, step; userid | query | stm | seg | step | maxtime | avgtime | rows | bytes | rate_row | rate_byte | label | is_diskbased | workmem | is_rrscan | is_delayed_scan | rows_pre_filter --------+---------+-----+-----+------+------------+------------+-------------+---------------+----------+------------+-----------------------------------------+--------------+-------------+-----------+-----------------+----------------- 100 | 1400920 | 0 | 0 | 0 | ★538841109 | 472190527 | 18000048306 | 474890157791 | 33457338 | 882695460 | scan tbl=110211 name=lineitem | f | 0 | f | f | 18000048306 100 | 1400920 | 0 | 0 | 1 | 538841109 | 472190527 | 18000048306 | 0 | 33457338 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 0 | 0 | 2 | 538841109 | 472190527 | 18000048306 | 751598030540 | 33457338 | 1397022361 | dist | f | 0 | f | f | 0 100 | 1400920 | 0 | 1 | 0 | 538856386 | 538853041 | 18000048306 | 751598030540 | 33457338 | 1397022361 | scan tbl=3399 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400920 | 0 | 1 | 1 | 538856386 | 538853041 | 18000048306 | 0 | 33457338 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 0 | 1 | 2 | 538856386 | 538853041 | 18000048306 | 967598610212 | 33457338 | 1798510427 | hash tbl=1045 | ★t | ★26732396544 | f | f | 0 100 | 1400920 | 1 | 2 | 0 | ★552474705 | 523335352 | 18000048306 | 690890737463 | 32608783 | 1251613654 | scan tbl=110211 name=lineitem | f | 0 | f | f | 18000048306 100 | 1400920 | 1 | 2 | 1 | 552474705 | 523335352 | 18000048306 | 0 | 32608783 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 2 | 2 | 552474705 | 523335352 | 18000048306 | 1039598803436 | 32608783 | 1883331165 | dist | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 0 | 1007056682 | 991586494 | 18000048306 | 1039598803436 | 17874923 | 1032372198 | scan tbl=3727 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 1 | ★1007056682 | 991586494 | 18000048306 | 0 | 17874923 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 2 | 1007056682 | 991586494 | 18595188330 | 0 | 18465926 | 0 | hjoin tbl=1045 | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 3 | 1007056682 | 991586494 | 18595188330 | 0 | 18465926 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 4 | 1007056682 | 991586494 | 18595188330 | 0 | 18465926 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 3 | 5 | 1007056682 | 991586494 | 112 | 5376 | 0 | 5 | aggr tbl=1056 | f | 450232320 | f | f | 0 100 | 1400920 | 1 | 3 | 6 | 1007056682 | 991586494 | 112 | 3584 | 0 | 3 | dist | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 0 | ★1007575421 | 1007574356 | 112 | 3584 | 0 | 3 | scan tbl=3728 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 1 | 1007575421 | 1007574356 | 7 | 336 | 0 | 0 | aggr tbl=1059 | f | 7203717120 | f | f | 0 100 | 1400920 | 1 | 4 | 2 | 1007575421 | 1007574356 | 7 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 3 | 1007575421 | 1007574356 | 7 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 4 | 1007575421 | 1007574356 | 7 | 224 | 0 | 0 | sort tbl=1062 | f | 1800929280 | f | f | 0 100 | 1400920 | 1 | 4 | 5 | 1007575421 | 1007574356 | 0 | 0 | 0 | 0 | merge | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 6 | 1007575421 | 1007574356 | 0 | 0 | 0 | 0 | aggr tbl=1065 | f | 0 | f | f | 0 100 | 1400920 | 1 | 4 | 7 | 1007575421 | 1007574356 | 0 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400920 | 2 | 5 | 0 | 484 | 320 | 7 | 224 | | | scan tbl=1062 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400920 | 2 | 5 | 1 | 484 | 320 | 7 | 224 | | | return | f | 0 | f | f | 0 100 | 1400920 | 2 | 6 | 0 | 1266 | 1266 | 0 | 0 | | | merge | f | 0 | f | f | 0 100 | 1400920 | 2 | 6 | 1 | 1266 | 1266 | 7 | 0 | | | project | f | 0 | f | f | 0 100 | 1400920 | 2 | 6 | 2 | 1266 | 1266 | 7 | 112 | | | return | f | 0 | f | f | 0 >|sql|
- SVL_QUERY_METRICS でクエリのメトリクスを確認する
- query_temp_blocks_to_disk から約1.9TB(2,028,253MB)の中間結果をストレージに書き出している。
- segment_execution_time(単位:秒) から Segment 0-4 で時間がかかっている。
select * from svl_query_metrics where query = :pg_last_query_id order by dimension; userid | query | service_class | dimension | segment | step | step_label | query_cpu_time | query_blocks_read | query_execution_time | query_cpu_usage_percent | query_temp_blocks_to_disk | segment_execution_time | cpu_skew | io_skew | scan_row_count | join_row_count | nested_loop_join_row_count | return_row_count | spectrum_scan_row_count | spectrum_scan_size_mb | query_queue_time | service_class_name --------+---------+---------------+-----------+---------+------+------------+----------------+-------------------+----------------------+-------------------------+---------------------------+------------------------+----------+---------+----------------+----------------+----------------------------+------------------+-------------------------+-----------------------+------------------+------------------------------------------------------------------ 100 | 1400920 | 100 | query | | | | 11292 | 680519 | 1547 | 19.98 | ★2028253 | | | | | | | | | | | Default queue 100 | 1400920 | 100 | segment | 2 | | | | | 1547 | | | ★552 | 1.09 | 1.01 | | | | | | | | Default queue 100 | 1400920 | 100 | segment | 3 | | | | | 1547 | | | ★1008 | 1.06 | | | | | | | | | Default queue 100 | 1400920 | 100 | segment | 0 | | | | | 1547 | | | ★538 | 1.08 | 1.00 | | | | | | | | Default queue 100 | 1400920 | 100 | segment | 1 | | | | | 1547 | | | ★539 | 1.81 | | | | | | | | | Default queue 100 | 1400920 | 100 | segment | 4 | | | | | 1547 | | | ★1008 | 1.10 | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 1 | 2 | hash | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 0 | scan | | | 1547 | | | | | | 18000048306 | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 6 | dist | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 5 | aggr | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 3 | project | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 1 | project | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 1 | 0 | scan | | | 1547 | | | | | | 18000041452 | | | | | | | Default queue 100 | 1400920 | 100 | step | 0 | 0 | scan | | | 1547 | | | | | | 17967314808 | | | | | | | Default queue 100 | 1400920 | 100 | step | 0 | 2 | dist | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 2 | hjoin | | | 1547 | | | | | | | 18583858771 | | | | | | Default queue 100 | 1400920 | 100 | step | 2 | 1 | project | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 2 | 0 | scan | | | 1547 | | | | | | 17971456153 | | | | | | | Default queue 100 | 1400920 | 100 | step | 3 | 4 | project | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 2 | 2 | dist | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 0 | 1 | project | | | 1547 | | | | | | | | | | | | | Default queue 100 | 1400920 | 100 | step | 1 | 1 | project | | | 1547 | | | | | | | | | | | | | Default queue
- SVL_QUERY_REPORT で Segment の実行開始・終了時間を確認する
- Segment 3 と 4 の start_time と end_time が同時間帯のため、Segment 3 と 4 は並行で実行されている。
- つまり、Segment 3、4 の合計で約17分(100秒)要している。
=# select segment, slice, min(start_time) start_time_min, max(end_time) end_time_max from svl_query_report where query = :pg_last_query_id and segment in (3, 4) group by segment, slice order by segment, slice; segment | slice | start_time_min | end_time_max ---------+-------+----------------------------+---------------------------- 3 | 0 | 2023-04-14 05:59:12.945673 | 2023-04-14 06:15:26.334688 3 | 1 | 2023-04-14 05:59:12.945823 | 2023-04-14 06:15:26.873808 3 | 2 | 2023-04-14 05:59:12.945965 | 2023-04-14 06:15:27.812101 3 | 3 | 2023-04-14 05:59:12.946126 | 2023-04-14 06:15:28.006565 3 | 4 | 2023-04-14 05:59:12.950082 | 2023-04-14 06:15:41.992692 3 | 5 | 2023-04-14 05:59:12.950206 | 2023-04-14 06:15:42.857837 3 | 6 | 2023-04-14 05:59:12.953663 | 2023-04-14 06:15:44.907382 3 | 7 | 2023-04-14 05:59:12.965558 | 2023-04-14 06:15:44.40274 3 | 8 | 2023-04-14 05:59:12.944929 | 2023-04-14 06:15:59.388491 3 | 9 | 2023-04-14 05:59:12.945254 | 2023-04-14 06:15:59.434196 3 | 10 | 2023-04-14 05:59:12.945456 | 2023-04-14 06:15:54.800025 3 | 11 | 2023-04-14 05:59:12.945117 | 2023-04-14 06:16:00.001799 3 | 12 | 2023-04-14 05:59:12.947844 | 2023-04-14 06:15:47.805595 3 | 13 | 2023-04-14 05:59:12.94682 | 2023-04-14 06:15:48.432422 3 | 14 | 2023-04-14 05:59:12.947364 | 2023-04-14 06:15:48.744143 3 | 15 | 2023-04-14 05:59:12.946962 | 2023-04-14 06:15:50.762271 4 | 0 | 2023-04-14 05:59:12.941083 | 2023-04-14 06:16:00.515626 4 | 1 | 2023-04-14 05:59:12.941058 | 2023-04-14 06:16:00.515746 4 | 2 | 2023-04-14 05:59:12.941051 | 2023-04-14 06:16:00.515683 4 | 3 | 2023-04-14 05:59:12.941127 | 2023-04-14 06:16:00.515721 4 | 4 | 2023-04-14 05:59:12.948774 | 2023-04-14 06:16:00.521407 4 | 5 | 2023-04-14 05:59:12.948658 | 2023-04-14 06:16:00.521353 4 | 6 | 2023-04-14 05:59:12.948649 | 2023-04-14 06:16:00.521429 4 | 7 | 2023-04-14 05:59:12.949042 | 2023-04-14 06:16:00.521365 4 | 8 | 2023-04-14 05:59:12.940503 | 2023-04-14 06:16:00.515515 4 | 9 | 2023-04-14 05:59:12.940535 | 2023-04-14 06:16:00.515523 4 | 10 | 2023-04-14 05:59:12.940538 | 2023-04-14 06:16:00.515505 4 | 11 | 2023-04-14 05:59:12.940875 | 2023-04-14 06:16:00.515467 4 | 12 | 2023-04-14 05:59:12.941291 | 2023-04-14 06:16:00.516618 4 | 13 | 2023-04-14 05:59:12.941292 | 2023-04-14 06:16:00.516713 4 | 14 | 2023-04-14 05:59:12.941254 | 2023-04-14 06:16:00.516624 4 | 15 | 2023-04-14 05:59:12.941471 | 2023-04-14 06:16:00.516613
結合キーに collate 関数を使用しない
- Amazon Redshift はデフォルトで case sensitive(大文字小文字区別あり)のため、関数を使わずに case_insensitive にする場合は、データベースレベル(CREATE DATABASE)かテーブルレベル(CREATE TABLE)で指定する。
- データロード後にデータベースレベルの変更はできないので、最初に設定しておく必要がある。テーブルレベルも ALTER TABLE で変更できないため、CREATE TABLE 時に指定しておく。
select a.l_linenumber, avg(a.l_quantity) from lineitem a, lineitem b where a.l_orderkey = b.l_orderkey and a.l_shipdate = b.l_shipdate group by a.l_linenumber order by a.l_linenumber; l_linenumber | avg --------------+------- 1 | 25.49 2 | 25.49 3 | 25.49 4 | 25.50 5 | 25.50 6 | 25.49 7 | 25.50 (7 rows) Time: 109130.947 ms (01:49.131)
- クエリIDを取得する
select pg_last_query_id();
\gset
- SVL_QUERY_SUMMARY でボトルネックを確認
- 結合方式は Merge Join になっている。
select * from svl_query_summary where query = :pg_last_query_id order by stm, seg, step; userid | query | stm | seg | step | maxtime | avgtime | rows | bytes | rate_row | rate_byte | label | is_diskbased | workmem | is_rrscan | is_delayed_scan | rows_pre_filter --------+---------+-----+-----+------+-----------+-----------+-------------+--------------+-----------+------------+-----------------------------------------+--------------+-------------+-----------+-----------------+----------------- 100 | 1400839 | 0 | 0 | 0 | 108904062 | 101244834 | 18000048306 | 690890737463 | 166667113 | 6397136457 | scan tbl=110211 name=lineitem | f | 0 | f | f | 18000048306 100 | 1400839 | 0 | 0 | 1 | 108904062 | 101244834 | 18000048306 | 0 | 166667113 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 2 | 108904062 | 101244834 | 18000048306 | 0 | 166667113 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 3 | 108904062 | 101244834 | 18000048306 | 0 | 166667113 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 4 | 108904062 | 101244834 | 18595188330 | 0 | 172177669 | 0 | ★mjoin tbl=1043 | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 5 | 108904062 | 101244834 | 18595188330 | 0 | 172177669 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 6 | 108904062 | 101244834 | 18595188330 | 0 | 172177669 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 7 | 108904062 | 101244834 | 112 | 5376 | 1 | 49 | aggr tbl=1051 | f | 1670774784 | f | f | 0 100 | 1400839 | 0 | 0 | 8 | 108904062 | 101244834 | 112 | 3584 | 1 | 33 | dist | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 9 | 108904062 | 101244834 | 18000048306 | 402145489327 | 166667113 | 3723569345 | scan tbl=110211 name=lineitem | f | 0 | f | f | 18000048306 100 | 1400839 | 0 | 0 | 10 | 108904062 | 101244834 | 18000048306 | 0 | 166667113 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 0 | 11 | 108904062 | 101244834 | 18000048306 | 0 | 166667113 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 0 | 108911239 | 108910624 | 112 | 3584 | 1 | 33 | scan tbl=3098 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 1 | 108911239 | 108910624 | 7 | 336 | 0 | 3 | aggr tbl=1054 | f | 26732396544 | f | f | 0 100 | 1400839 | 0 | 1 | 2 | 108911239 | 108910624 | 7 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 3 | 108911239 | 108910624 | 7 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 4 | 108911239 | 108910624 | 7 | 224 | 0 | 2 | sort tbl=1057 | f | 6683099136 | f | f | 0 100 | 1400839 | 0 | 1 | 5 | 108911239 | 108910624 | 0 | 0 | 0 | 0 | merge | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 6 | 108911239 | 108910624 | 0 | 0 | 0 | 0 | aggr tbl=1059 | f | 0 | f | f | 0 100 | 1400839 | 0 | 1 | 7 | 108911239 | 108910624 | 0 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 1400839 | 1 | 2 | 0 | 444 | 319 | 7 | 224 | | | scan tbl=1057 name=Internal Worktable | f | 0 | f | f | 0 100 | 1400839 | 1 | 2 | 1 | 444 | 319 | 7 | 224 | | | return | f | 0 | f | f | 0 100 | 1400839 | 1 | 3 | 0 | 1313 | 1313 | 0 | 0 | | | merge | f | 0 | f | f | 0 100 | 1400839 | 1 | 3 | 1 | 1313 | 1313 | 7 | 0 | | | project | f | 0 | f | f | 0 100 | 1400839 | 1 | 3 | 2 | 1313 | 1313 | 7 | 112 | | | return | f | 0 | f | f | 0
- SVL_QUERY_METRICS でクエリのメトリクスを確認する
- 時間を要している Segment は 0 と 1 で、109秒。
- query_temp_blocks_to_disk は null でストレージに中間結果を書き出していない。
select * from svl_query_metrics where query = :pg_last_query_id order by dimension; userid | query | service_class | dimension | segment | step | step_label | query_cpu_time | query_blocks_read | query_execution_time | query_cpu_usage_percent | ★query_temp_blocks_to_disk | ★segment_execution_time | cpu_skew | io_skew | scan_row_count | join_row_count | nested_loop_join_row_count | return_row_count | spectrum_scan_row_count | spectrum_scan_size_mb | query_queue_time | service_class_name --------+---------+---------------+-----------+---------+------+------------+----------------+-------------------+----------------------+-------------------------+---------------------------+------------------------+----------+---------+----------------+----------------+----------------------------+------------------+-------------------------+-----------------------+------------------+------------------------------------------------------------------ 100 | 1400839 | 100 | query | | | | 1328 | 665083 | 109 | 39.65 | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | segment | 0 | | | | | 109 | | | ★109 | 1.05 | 1.01 | | | | | | | | Default queue 100 | 1400839 | 100 | segment | 1 | | | | | 109 | | | ★109 | 1.33 | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 6 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 1 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 3 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 0 | scan | | | 109 | | | | | | 17895323648 | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 5 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 4 | mjoin | | | 109 | | | | | | | 18486895360 | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 10 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 11 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 2 | project | | | 109 | | | | | | | | | | | | | Default queue 100 | 1400839 | 100 | step | 0 | 9 | scan | | | 109 | | | | | | 17895840000 | | | | | | | Default queue
事前準備
データセットは Cloud Data Warehouse Benchmark Derived from TPC-H の 3TB を使用。
- ユーザーレベルで結果キャッシュを無効化
alter user awsuser set enable_result_cache_for_session = off;
- データベースを作成
create database tpch_3tb_other_key3 collate case_insensitive isolation level snapshot;
- テーブル作成
- l_orderkey を varchar に変更している
create table lineitem ( l_orderkey varchar(11) not null , l_partkey int8 not null, l_suppkey int4 not null, l_linenumber int4 not null, l_quantity numeric(12,2) not null, l_extendedprice numeric(12,2) not null, l_discount numeric(12,2) not null, l_tax numeric(12,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null , l_commitdate date not null, l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null, Primary Key(L_ORDERKEY, L_LINENUMBER) ) distkey(l_orderkey) sortkey(l_shipdate, l_orderkey);
- データをロードする
copy lineitem from 's3://redshift-downloads/TPC-H/2.18/3TB/orders/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';