事象
pgbench から Pgpool-II 経由で Aurora PostgreSQL に負荷をかけるとクエリが均等分散せず、特定の Reader に負荷が偏る(pgpool-IIに接続して show pool_nodes を実行、select_cnt でインスタンス毎のクエリ実行回数を確認)。
- /etc/pgpool-II/pgpool.conf
backend_hostname0 = 'aurora-postgres117.cluster-********.ap-northeast-1.rds.amazonaws.com' backend_port0 = 5432 backend_weight0 = 0 backend_flag0 = 'ALWAYS_MASTER' backend_application_name0 = 'server0' backend_hostname1 = 'aurora-postgres117.cluster-ro-********.ap-northeast-1.rds.amazonaws.com' backend_port1 = 5432 backend_weight1 = 1 backend_application_name1 = 'server1'
原因
解決策
- Reader エンドポイントではなくインスタンスエンドポイントをPgpool-IIで指定する。(フェイルオーバー時に動的に pgpool.conf を書換える必要あり)
- /etc/pgpool-II/pgpool.conf の設定例
backend_hostname0 = 'aurora-postgres117.cluster-********.ap-northeast-1.rds.amazonaws.com' backend_port0 = 5432 backend_weight0 = 0 backend_flag0 = 'ALWAYS_MASTER' backend_application_name0 = 'server0' backend_hostname1 = 'reader-a-1.********.ap-northeast-1.rds.amazonaws.com' backend_port1 = 5432 backend_weight1 = 1 backend_application_name1 = 'server1' backend_hostname2 = 'reader-a-2.********.ap-northeast-1.rds.amazonaws.com' backend_port2 = 5432 backend_weight2 = 1 backend_application_name2 = 'server2' backend_hostname3 = 'reader-c-1.********.ap-northeast-1.rds.amazonaws.com' backend_port3 = 5432 backend_weight3 = 1 backend_application_name3 = 'server3' backend_hostname4 = 'reader-c-2.********.ap-northeast-1.rds.amazonaws.com' backend_port4 = 5432 backend_weight4 = 1 backend_application_name4 = 'server4'
- pgbench で負荷をかけるとクエリが均等分散する
$ pgbench -Sn -c 50 -j 50 -t 10000 -U awsuser -h localhost -d writer -p 9999 > /dev/null 2>&1 & $ psql "host=localhost dbname=writer port=9999 user=awsuser" writer=> show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+--------------------------------------------------------------------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | aurora-postgres117.cluster-********.ap-northeast-1.rds.amazonaws.com | 5432 | up | 0.000000 | primary | 1 | false | 0 | | | 2020-11-20 11:40:23 1 | reader-a-1.********.ap-northeast-1.rds.amazonaws.com | 5432 | up | 0.250000 | standby | 2608★ | false | 0 | | | 2020-11-20 11:40:23 2 | reader-a-2.********.ap-northeast-1.rds.amazonaws.com | 5432 | up | 0.250000 | standby | 2608★ | true | 0 | | | 2020-11-20 11:40:23 3 | reader-c-1.********.ap-northeast-1.rds.amazonaws.com | 5432 | up | 0.250000 | standby | 2589★ | false | 0 | | | 2020-11-20 11:40:23 4 | reader-c-2.********.ap-northeast-1.rds.amazonaws.com | 5432 | up | 0.250000 | standby | 2598★ | false | 0 | | | 2020-11-20 11:40:23 (5 行)
前提
- Pgpool-II で参照クエリを全て Reader に振るよう設定している
- 検証時は pgbench から参照クエリのみ実行している
- 環境
考慮事項
補足
Reader エンドポイントの名前解決は1秒未満では特定のIPアドレスに偏る。1秒スリープを入れて名前すると各リードレプリカのIPアドレスに均等分散される
- スリープなしで100回名前解決
$ for i in {1..100} do dig +short aurora-postgresql-11.cluster-ro-********.ap-northeast-1.rds.amazonaws.com done|perl -nle '/^\d+\./ and print'|sort|uniq -c 100 10.1.2.39
- 1秒スリープして100回名前解決
$ for i in {1..100} do dig +short aurora-postgresql-11.cluster-ro-********.ap-northeast-1.rds.amazonaws.com sleep 1 done|perl -nle '/^\d+\./ and print'|sort|uniq -c 25 10.1.1.225 24 10.1.2.39 29 10.1.3.183 22 10.1.3.198