ソートマージ結合になる条件
- 両テーブルの結合キーが分散キーとソートキーに指定されている
- 結合対象表の未ソート率が20%未満
Query plan - Amazon Redshift
Merge Join
Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.
検証結果
ソートマージ結合になるパターン
select sum(l_quantity) from lineitem join orders on o_orderkey = l_orderkey and o_orderdate = l_shipdate and l_quantity > 1;
- クエリIDを取得する
=# select pg_last_query_id(); =# \gset pg_last_query_id ------------------ 26539
- SVL_QUERY_SUMMARY でボトルネックを確認
- Segment 0 で64秒要しており、ここで最も時間を使っている。
- ソートマージ結合で結合されている(mjoin tbl=1043)
=# select * from svl_query_summary where query = :pg_last_query_id order by stm, seg, step; userid | query | stm | seg | step | maxtime | avgtime | rows | bytes | rate_row | rate_byte | label | is_diskbased | workmem | is_rrscan | is_delayed_scan | rows_pre_filter --------+-------+-----+-----+------+----------+----------+-------------+--------------+-----------+------------+-------------------------------------------+--------------+---------+-----------+-----------------+----------------- 100 | 26539 | 0 | 0 | 0 | 64652349 | 63199018 | 14034948390 | 392978554920 | 219296068 | 6140289920 | scan tbl=107017 name=lineitem | f | 0 | t | f | 17543814080 100 | 26539 | 0 | 0 | 1 | 64652349 | 63199018 | 14034948390 | 0 | 219296068 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 2 | 64652349 | 63199018 | 14034948390 | 0 | 219296068 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 3 | 64652349 | 63199018 | 14034948390 | 0 | 219296068 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 4 | 64652349 | 63199018 | 0 | 0 | 0 | 0 | mjoin★ tbl=1043 | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 5 | 64652349 | 63199018 | 0 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 6 | 64652349 | 63199018 | 0 | 0 | 0 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 7 | 64652349 | 63199018 | 16 | 512 | 0 | 8 | aggr tbl=1051 | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 8 | 64652349 | 63199018 | 4500000000 | 65988025760 | 70312500 | 1031062902 | scan tbl=107025 name=orders | f | 0 | f | f | 4500000000 100 | 26539 | 0 | 0 | 9 | 64652349 | 63199018 | 4500000000 | 0 | 70312500 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 0 | 0 | 10 | 64652349 | 63199018 | 4500000000 | 0 | 70312500 | 0 | project | f | 0 | f | f | 0 100 | 26539 | 1 | 1 | 0 | 572 | 399 | 16 | 512 | | | scan tbl=1051 name=Internal Worktable | f | 0 | f | f | 0 100 | 26539 | 1 | 1 | 1 | 572 | 399 | 16 | 512 | | | return | f | 0 | f | f | 0 100 | 26539 | 1 | 2 | 0 | 1493 | 1493 | 16 | 512 | | | scan tbl=102236 name=Internal Worktable | f | 0 | f | f | 0 100 | 26539 | 1 | 2 | 1 | 1493 | 1493 | 1 | 32 | | | aggr tbl=1056 | f | 0 | f | f | 0 100 | 26539 | 2 | 3 | 0 | 381 | 381 | 1 | 32 | | | scan tbl=1056 name=Internal Worktable | f | 0 | f | f | 0 100 | 26539 | 2 | 3 | 1 | 381 | 381 | 1 | 0 | | | project | f | 0 | f | f | 0 100 | 26539 | 2 | 3 | 2 | 381 | 381 | 1 | 0 | | | project | f | 0 | f | f | 0 100 | 26539 | 2 | 3 | 3 | 381 | 381 | 1 | 6 | | | return | f | 0 | f | f | 0
- SVL_QUERY_METRICS でクエリのメトリクスを確認する
- segment 0 の segment_execution_time が 64秒
=# select * from svl_query_metrics where query = :pg_last_query_id order by dimension; userid | query | service_class | dimension | segment | step | step_label | query_cpu_time | query_blocks_read | query_execution_time | query_cpu_usage_percent | query_temp_blocks_to_disk | segment_execution_time | cpu_skew | io_skew | scan_row_count | join_row_count | nested_loop_join_row_count | return_row_count | spectrum_scan_row_count | spectrum_scan_size_mb | query_queue_time | service_class_name --------+-------+---------------+-----------+---------+------+------------+----------------+-------------------+----------------------+-------------------------+---------------------------+------------------------+----------+---------+----------------+----------------+----------------------------+------------------+-------------------------+-----------------------+------------------+------------------------------------------------------------------ 100 | 26539 | 100 | query | | | | 253 | 305441 | 65 | 25.18 | | | | | | | | | | | | Default queue 100 | 26539 | 100 | segment | 0 | | | | | 65 | | | 64 | 1.02 | 1.01 | | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 9 | project | | | 65 | | | | | | | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 3 | project | | | 65 | | | | | | | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 2 | project | | | 65 | | | | | | | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 0 | scan | | | 65 | | | | | | 17420711680 | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 1 | project | | | 65 | | | | | | | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 8 | scan | | | 65 | | | | | | 4470136512 | | | | | | | Default queue 100 | 26539 | 100 | step | 0 | 10 | project | | | 65 | | | | | | | | | | | | | Default queue
ハッシュ結合になるパターン
事前準備
データセットは Cloud Data Warehouse Benchmark Derived from TPC-H の 3TB を使用。
共通
- ユーザーレベルで結果キャッシュを無効化
alter user awsuser set enable_result_cache_for_session = off;
ソートマージ結合になるパターン
- テーブルを作成する
create table lineitem ( l_orderkey int8 not null , l_partkey int8 not null, l_suppkey int4 not null, l_linenumber int4 not null, l_quantity numeric(12,2) not null, l_extendedprice numeric(12,2) not null, l_discount numeric(12,2) not null, l_tax numeric(12,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null , l_commitdate date not null, l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null, Primary Key(L_ORDERKEY, L_LINENUMBER) ) distkey(l_orderkey) sortkey(l_shipdate,l_orderkey) ; create table orders ( o_orderkey int8 not null, o_custkey int8 not null, o_orderstatus char(1) not null, o_totalprice numeric(12,2) not null, o_orderdate date not null, o_orderpriority char(15) not null, o_clerk char(15) not null, o_shippriority int4 not null, o_comment varchar(79) not null, Primary Key(O_ORDERKEY) ) distkey(o_orderkey) sortkey(o_orderdate, o_orderkey) ;
- データをロードする
copy orders from 's3://redshift-downloads/TPC-H/2.18/3TB/orders/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1'; copy lineitem from 's3://redshift-downloads/TPC-H/2.18/3TB/lineitem/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
- 件数を確認する
=# select count(*) from lineitem; count ------------- 18000048306 =# select count(*) from orders; count ------------ 4500000000
ハッシュ結合にならないパターン
- データベースを作成する
create database tpch_3tb_other_key2 collate case_insensitive isolation level snapshot;
- テーブルを作成する
create table lineitem ( l_orderkey int8 not null , l_partkey int8 not null, l_suppkey int4 not null, l_linenumber int4 not null, l_quantity numeric(12,2) not null, l_extendedprice numeric(12,2) not null, l_discount numeric(12,2) not null, l_tax numeric(12,2) not null, l_returnflag char(1) not null, l_linestatus char(1) not null, l_shipdate date not null , l_commitdate date not null, l_receiptdate date not null, l_shipinstruct char(25) not null, l_shipmode char(10) not null, l_comment varchar(44) not null, Primary Key(L_ORDERKEY, L_LINENUMBER) ) distkey(l_orderkey) sortkey(l_shipdate) ; create table orders ( o_orderkey int8 not null, o_custkey int8 not null, o_orderstatus char(1) not null, o_totalprice numeric(12,2) not null, o_orderdate date not null, o_orderpriority char(15) not null, o_clerk char(15) not null, o_shippriority int4 not null, o_comment varchar(79) not null, Primary Key(O_ORDERKEY) ) distkey(o_orderkey) sortkey(o_orderdate) ;
- データをロードする
copy lineitem from 's3://redshift-downloads/TPC-H/2.18/100GB/lineitem/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1'; copy orders from 's3://redshift-downloads/TPC-H/2.18/100GB/orders/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
- 件数を確認する
=# select count(*) from lineitem; count ------------- 18000048306 =# select count(*) from orders; count ------------ 4500000000