ablog

不器用で落着きのない技術者のメモ

Redshift の svv_table_info の skew_sortkey1

svv_table_info の skew_sortkey1 の値は小さいほどよい。仮に skew_sortkey1 が 5 の場合、where句の条件でフィルタする前に他のカラムのデータを5ブロック読込む。

Ratio of the size of the largest non-sort key column to the size of the first column of the sort key, if a sort key is defined. Use this value to evaluate the effectiveness of the sort key.

SVV_TABLE_INFO - Amazon Redshift

A large skew_sortkey1 value means that the ratio of the size of largest non-sort key column to the first column of sort key is large which means row offsets in one disk block for the sort key corresponds to more disk blocks in the data column.

For example lets say skew_sortkey1 value is 5 for a table. Now the row offsets in one disk block for the sort key corresponds to 5 disk blocks for other data columns. Zone map stores the min and max value for the sort key disk block, so when you query this table with a where clause on sort key redshift identifies the sort key block which contains this data (block min < where clause value < block_max) and fetches the row offsets for that column. Now since the skew_sortkey1 is 5, it has to fetch 5 blocks for the data columns before filtering the records to the desired ones.

So to conclude having a high skew_sortkey1 value is not desirable.

What does the column skew_sorkey1 in Amazon Redshift's svv_table_info imply? - Stack Overflow