ablog

不器用で落着きのない技術者のメモ

AWS DMS タスク停止時に PostgreSQL の WAL が溜り続けないようにする

AWS DMS のソースが PostgreSQL で CDC を使う場合、DMS タスクを停止すると、PostgreSQL の WAL が溜まってストレージ使用量が増える。


レプリケーションスロットを削除すると、WAL が削除されるようになる。

SELECT * FROM pg_replication_slots;
SELECT pg_drop_replication_slot('<レプリケーションスロット名>')

ただし、DMS タスクを再開することはできなくなり、再同期したい場合はフルロード+CDCをやり直す必要がある。

参考

次のようなクエリを実行して、レプリケーションスロットの有無とサイズを確認します。

PostgreSQL v9

psql=> select slot_name, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn)) as 
replicationSlotLag, active from pg_replication_slots ;

PostgreSQL v10 および v11

psql=> select slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) as replicationSlotLag, 
active from pg_replication_slots ;

消費されていない (アクティブ状態が False である) レプリケーションスロットを特定したら、次のようなクエリを実行してレプリケーションスロットを削除できます。

psql=>select pg_drop_replication_slot('Your_slotname_name');
Amazon RDS for PostgreSQL で DiskFull のエラーを解決する

Enabling CDC using a self-managed PostgreSQL database as a AWS DMS source
AWS DMS supports change data capture (CDC) using logical replication. To enable logical replication of a self-managed PostgreSQL source database, set the following parameters and values in the postgresql.conf configuration file:

  • Set wal_level = logical.
  • Set max_replication_slots to a value greater than 1.
  • Set the max_replication_slots value according to the number of tasks that you want to run. For example, to run five tasks you set a minimum of five slots. Slots open automatically as soon as a task starts and remain open even when the task is no longer running. Make sure to manually delete open slots.
  • Set max_wal_senders to a value greater than 1.
    • The max_wal_senders parameter sets the number of concurrent tasks that can run.
  • Set wal_sender_timeout =0.
    • The wal_sender_timeout parameter ends replication connections that are inactive longer than the specified number of milliseconds. Although the default is 60 seconds, we recommend that you set this parameter to zero. Doing this turns off the timeout mechanism.

Some parameters are static, and you can only set them at server start. Any changes to their entries in the configuration file (for a self-managed database) or DB parameter group (for an RDS for PostgreSQL database) are ignored until the server is restarted. For more information, see the PostgreSQL documentation.

Using a PostgreSQL database as an AWS DMS source - AWS Database Migration Service
  1. Use the AWS master user account for the PostgreSQL DB instance as the user account for the PostgreSQL source endpoint. The master user account has the required roles that allow it to set up CDC.If you use an account other than the master user account, make sure to create several objects from the master account for the account that you use. For more information, see Migrating an Amazon RDS for PostgreSQL database without using the master user account.
  2. Set the rds.logical_replication parameter in your DB CLUSTER parameter group to 1. This static parameter requires a reboot of the DB instance to take effect. As part of applying this parameter, AWS DMS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. These parameter changes can increase write ahead log (WAL) generation, so only set rds.logical_replication when you use logical replication slots.
  3. Set the wal_sender_timeout parameter to 0, as a best practice. Setting this parameter to 0 prevents PostgreSQL from terminating replication connections that are inactive longer than the specified timeout. When AWS DMS migrates data, replication connections need to be able to last longer than the specified timeout.
  4. Ensure the value of the max_worker_processes parameter in your DB Cluster Parameter Group is equal to, or higher than the total combined values of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers. A high number of background worker processes might impact application workloads on small instances. So, monitor performance of your database if you set max_worker_processes higher than the default value.
Using a PostgreSQL database as an AWS DMS source - AWS Database Migration Service