ablog

不器用で落着きのない技術者のメモ

ElastiCache(Redis)のフェイルオーバー時間

ElastiCache(Redis)を手動フェイルオーバーしたときの時間を計測してみたメモ。

構成

  • クラスターモード有効
  • 1シャード x 2レプリカ = 3ノード
  • ノードタイプ: r5.large

f:id:yohei-a:20210906070206p:plain

手順

$ ruby redis-rb_sample.rb
  • マネジメントコンソールから手動でフェイルオーバーする

f:id:yohei-a:20210906072651p:plain

  • マネジメントコンソールでイベントを確認しつつ、スクリプトを再実行する

結果

  • 平均67秒(3回の平均)
    • "Test Failover API called for node group ..." から "Failover to replica node ... completed" までの時間で計測
    • "Failover to replica node ... completed" の時点でクライアントから読み書きできるようになる
  • フェイルオーバー時に出力されるメッセージ
送信元ID タイプ 日付 イベント
redis-cluster-no-auth2-0001-003 cache-cluster 2021年9月6日月曜日 7時02分35秒 UTC+9 Finished recovery for cache nodes 0001
redis-cluster-no-auth2-0001-003 cache-cluster 2021年9月6日月曜日 6時57分16秒 UTC+9 Recovering cache nodes 0001
redis-cluster-no-auth2 replication-group 2021年9月6日月曜日 6時56分21秒 UTC+9 Failover to replica node redis-cluster-no-auth2-0001-002 completed
redis-cluster-no-auth2 replication-group 2021年9月6日月曜日 6時55分17秒 UTC+9 Test Failover API called for node group 0001
  • マネジメントコンソールのイベント

f:id:yohei-a:20210906070437p:plain

Redis クライアントからのテストプログラム

  • redis-rb_sample.rb
require "redis"
require "date"

redis = Redis.new(cluster: ["redis://redis-cluster-no-auth2.******.clustercfg.apne1.cache.amazonaws.com:6379"])

for num in 1..1000000 do
        redis.set('key_' + num.to_s, 'value_' + num.to_s)
        time = DateTime.now
        p time.to_s + " " + redis.get('key_' + num.to_s)
end
  • 実行中にフェイルオーバーするとエラーになり、"Failover to replica node ... completed" が出力された後は正常に読み書きできる状態に戻る
$ ruby redis-rb_sample.rb
"2021-09-05T22:22:13+00:00 value_2599"
"2021-09-05T22:22:13+00:00 value_2600"
"2021-09-05T22:22:13+00:00 value_2601"
Traceback (most recent call last):
        29: from redis-rb_sample.rb:6:in `<main>'
        28: from redis-rb_sample.rb:6:in `each'
        27: from redis-rb_sample.rb:9:in `block in <main>'
        26: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis.rb:958:in `get'
        25: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis.rb:70:in `synchronize'
        24: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/2.7.0/monitor.rb:202:in `mon_synchronize'
        23: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/2.7.0/monitor.rb:202:in `synchronize'
        22: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis.rb:70:in `block in synchronize'
        21: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis.rb:959:in `block in get'
        20: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/cluster.rb:72:in `call'
        19: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/cluster.rb:154:in `send_command'
        18: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/cluster.rb:218:in `try_send'
        17: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/cluster.rb:218:in `public_send'
        16: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:148:in `call'
        15: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:254:in `process'
        14: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:342:in `logging'
        13: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:255:in `block in process'
        12: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:403:in `ensure_connected'
        11: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:116:in `connect'
        10: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:330:in `with_reconnect'
         9: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:117:in `block in connect'
         8: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/client.rb:371:in `establish_connection'
         7: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:304:in `connect'
         6: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:190:in `connect'
         5: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:190:in `each_with_index'
         4: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:190:in `each'
         3: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:192:in `block in connect'
         2: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/redis-4.4.0/lib/redis/connection/ruby.rb:154:in `connect_addrinfo'
         1: from /home/ec2-user/.rbenv/versions/2.7.4/lib/ruby/2.7.0/socket.rb:1214:in `connect_nonblock'

補足

3シャード以上のクラスターでのフェイルオーバーは速い。

2. Three-shard minimum Redis Cluster: Having a minimum of three shards provides improved availability by providing faster recovery during both planned and unplanned failovers. Amazon ElastiCache for Redis supports up to 500 total nodes in a cluster, inclusive of shards and replicas.

Configure Amazon ElastiCache for Redis for higher availability | AWS Database Blog

4シャードのクラスターでフェイルオーバーしてみると30秒程度でフェイルオーバーした。
f:id:yohei-a:20210906082201p:plain
f:id:yohei-a:20210906082216p:plain