ablog

不器用で落着きのない技術者のメモ

Amazon Redshift のソートマージ結合とハッシュ結合を比較する

ソートマージ結合になる条件

  • 両テーブルの結合キーが分散キーとソートキーに指定されている
  • 結合対象表の未ソート率が20%未満


Merge Join

Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.

Query plan - Amazon Redshift

検証結果

ソートマージ結合になるパターン
select sum(l_quantity) 
from lineitem
join orders 
    on o_orderkey = l_orderkey 
        and o_orderdate = l_shipdate 
        and l_quantity > 1;
  • クエリIDを取得する
=# select pg_last_query_id();
=# \gset
 pg_last_query_id 
------------------
            26539
  • SVL_QUERY_SUMMARY でボトルネックを確認
    • Segment 0 で64秒要しており、ここで最も時間を使っている。
    • ソートマージ結合で結合されている(mjoin tbl=1043)
=# select * from svl_query_summary where query = :pg_last_query_id order by stm, seg, step;

 userid | query | stm | seg | step | maxtime  | avgtime  |    rows     |    bytes     | rate_row  | rate_byte  |                   label                   | is_diskbased | workmem | is_rrscan | is_delayed_scan | rows_pre_filter 
--------+-------+-----+-----+------+----------+----------+-------------+--------------+-----------+------------+-------------------------------------------+--------------+---------+-----------+-----------------+-----------------
    100 | 26539 |   0 |   0 |    0 | 64652349 | 63199018 | 14034948390 | 392978554920 | 219296068 | 6140289920 | scan   tbl=107017 name=lineitem           | f            |       0 | t         | f               |     17543814080
    100 | 26539 |   0 |   0 |    1 | 64652349 | 63199018 | 14034948390 |            0 | 219296068 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    2 | 64652349 | 63199018 | 14034948390 |            0 | 219296068 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    3 | 64652349 | 63199018 | 14034948390 |            0 | 219296068 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    4 | 64652349 | 63199018 |           0 |            0 |         0 |          0 | mjoin★  tbl=1043                           | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    5 | 64652349 | 63199018 |           0 |            0 |         0 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    6 | 64652349 | 63199018 |           0 |            0 |         0 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    7 | 64652349 | 63199018 |          16 |          512 |         0 |          8 | aggr   tbl=1051                           | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |    8 | 64652349 | 63199018 |  4500000000 |  65988025760 |  70312500 | 1031062902 | scan   tbl=107025 name=orders             | f            |       0 | f         | f               |      4500000000
    100 | 26539 |   0 |   0 |    9 | 64652349 | 63199018 |  4500000000 |            0 |  70312500 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   0 |   0 |   10 | 64652349 | 63199018 |  4500000000 |            0 |  70312500 |          0 | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   1 |   1 |    0 |      572 |      399 |          16 |          512 |           |            | scan   tbl=1051 name=Internal Worktable   | f            |       0 | f         | f               |               0
    100 | 26539 |   1 |   1 |    1 |      572 |      399 |          16 |          512 |           |            | return                                    | f            |       0 | f         | f               |               0
    100 | 26539 |   1 |   2 |    0 |     1493 |     1493 |          16 |          512 |           |            | scan   tbl=102236 name=Internal Worktable | f            |       0 | f         | f               |               0
    100 | 26539 |   1 |   2 |    1 |     1493 |     1493 |           1 |           32 |           |            | aggr   tbl=1056                           | f            |       0 | f         | f               |               0
    100 | 26539 |   2 |   3 |    0 |      381 |      381 |           1 |           32 |           |            | scan   tbl=1056 name=Internal Worktable   | f            |       0 | f         | f               |               0
    100 | 26539 |   2 |   3 |    1 |      381 |      381 |           1 |            0 |           |            | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   2 |   3 |    2 |      381 |      381 |           1 |            0 |           |            | project                                   | f            |       0 | f         | f               |               0
    100 | 26539 |   2 |   3 |    3 |      381 |      381 |           1 |            6 |           |            | return                                    | f            |       0 | f         | f               |               0
  • SVL_QUERY_METRICS でクエリのメトリクスを確認する
    • segment 0 の segment_execution_time が 64秒
=# select * from svl_query_metrics where query = :pg_last_query_id order by dimension;
 userid | query | service_class | dimension | segment | step | step_label | query_cpu_time | query_blocks_read | query_execution_time | query_cpu_usage_percent | query_temp_blocks_to_disk | segment_execution_time | cpu_skew | io_skew | scan_row_count | join_row_count | nested_loop_join_row_count | return_row_count | spectrum_scan_row_count | spectrum_scan_size_mb | query_queue_time |                        service_class_name                        
--------+-------+---------------+-----------+---------+------+------------+----------------+-------------------+----------------------+-------------------------+---------------------------+------------------------+----------+---------+----------------+----------------+----------------------------+------------------+-------------------------+-----------------------+------------------+------------------------------------------------------------------
    100 | 26539 |           100 | query     |         |      |            |            253 |            305441 |                   65 |                   25.18 |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | segment   |       0 |      |            |                |                   |                   65 |                         |                           |                     64 |     1.02 |    1.01 |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    9 | project    |                |                   |                   65 |                         |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    3 | project    |                |                   |                   65 |                         |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    2 | project    |                |                   |                   65 |                         |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    0 | scan       |                |                   |                   65 |                         |                           |                        |          |         |    17420711680 |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    1 | project    |                |                   |                   65 |                         |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |    8 | scan       |                |                   |                   65 |                         |                           |                        |          |         |     4470136512 |                |                            |                  |                         |                       |                  | Default queue                                                   
    100 | 26539 |           100 | step      |       0 |   10 | project    |                |                   |                   65 |                         |                           |                        |          |         |                |                |                            |                  |                         |                       |                  | Default queue                                                   
ハッシュ結合になるパターン

事前準備

データセットCloud Data Warehouse Benchmark Derived from TPC-H の 3TB を使用。

共通
  • ユーザーレベルで結果キャッシュを無効化
alter user awsuser set enable_result_cache_for_session = off;
ソートマージ結合になるパターン

  • テーブルを作成する
create table lineitem (
  l_orderkey int8 not null ,
  l_partkey int8 not null,
  l_suppkey int4 not null,
  l_linenumber int4 not null,
  l_quantity numeric(12,2) not null,
  l_extendedprice numeric(12,2) not null,
  l_discount numeric(12,2) not null,
  l_tax numeric(12,2) not null,
  l_returnflag char(1) not null,
  l_linestatus char(1) not null,
  l_shipdate date not null ,
  l_commitdate date not null,
  l_receiptdate date not null,
  l_shipinstruct char(25) not null,
  l_shipmode char(10) not null,
  l_comment varchar(44) not null,
  Primary Key(L_ORDERKEY, L_LINENUMBER)
) distkey(l_orderkey) sortkey(l_shipdate,l_orderkey)  ;

create table orders (
  o_orderkey int8 not null,
  o_custkey int8 not null,
  o_orderstatus char(1) not null,
  o_totalprice numeric(12,2) not null,
  o_orderdate date not null,
  o_orderpriority char(15) not null,
  o_clerk char(15) not null,
  o_shippriority int4 not null,
  o_comment varchar(79) not null,
  Primary Key(O_ORDERKEY)
) distkey(o_orderkey) sortkey(o_orderdate, o_orderkey) ;
  • データをロードする
copy orders from 's3://redshift-downloads/TPC-H/2.18/3TB/orders/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
copy lineitem from 's3://redshift-downloads/TPC-H/2.18/3TB/lineitem/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
  • 件数を確認する
=# select count(*) from lineitem;
    count    
-------------
 18000048306

=# select count(*) from orders;
   count    
------------
 4500000000
ハッシュ結合にならないパターン
  • データベースを作成する
create database tpch_3tb_other_key2
collate case_insensitive 
isolation level snapshot;
  • テーブルを作成する
create table lineitem (
  l_orderkey int8 not null ,
  l_partkey int8 not null,
  l_suppkey int4 not null,
  l_linenumber int4 not null,
  l_quantity numeric(12,2) not null,
  l_extendedprice numeric(12,2) not null,
  l_discount numeric(12,2) not null,
  l_tax numeric(12,2) not null,
  l_returnflag char(1) not null,
  l_linestatus char(1) not null,
  l_shipdate date not null ,
  l_commitdate date not null,
  l_receiptdate date not null,
  l_shipinstruct char(25) not null,
  l_shipmode char(10) not null,
  l_comment varchar(44) not null,
  Primary Key(L_ORDERKEY, L_LINENUMBER)
) distkey(l_orderkey) sortkey(l_shipdate)  ;

create table orders (
  o_orderkey int8 not null,
  o_custkey int8 not null,
  o_orderstatus char(1) not null,
  o_totalprice numeric(12,2) not null,
  o_orderdate date not null,
  o_orderpriority char(15) not null,
  o_clerk char(15) not null,
  o_shippriority int4 not null,
  o_comment varchar(79) not null,
  Primary Key(O_ORDERKEY)
) distkey(o_orderkey) sortkey(o_orderdate) ;
  • データをロードする
copy lineitem from 's3://redshift-downloads/TPC-H/2.18/100GB/lineitem/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
copy orders from 's3://redshift-downloads/TPC-H/2.18/100GB/orders/' iam_role 'arn:aws:iam::123456789012:role/redshift-spectrum-s3-fullaccess' delimiter '|' region 'us-east-1';
  • 件数を確認する
=# select count(*) from lineitem;
    count    
-------------
 18000048306

=# select count(*) from orders;
   count    
------------
 4500000000