ablog

不器用で落着きのない技術者のメモ

Tanel Poder の Linux Process Snapper の紹介

JPOUG Advent Calendar 2020 の21日目のエントリーです。20日目は@discus_hamburg さんのMac De Oracle: 標準はあるにはあるが癖の多いSQL 全部俺 #20 結果セットを単一列に連結するにも癖があるでした。

今回は Tanel PoderLinux Process Snapper を紹介します。Tanel は何度か来日したことがあり、写真は2012年の db tech showcase で来日した時のものです。
f:id:yohei-a:20201221231125j:plain
(盗撮:AZエージェンシー(c))

インストール

$ sudo yum -y install git gcc
$ git clone https://github.com/tanelpoder/0xtools
$ cd 0xtools/
$ make
$ sudo make install

使ってみる

  • pgbench を流しながらモニタリングしてみる
# pgbench で PostgeSQL にクエリ実行
$ pgbench -r -c 8 -j 8 -t 100000 -U postgres -h localhost -d postgres -p 5432 > /dev/null 2>&1 &
# pSnapper でサンプリングする
$ sudo psn -G syscall,wchan

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/syscall, stat, wchan for 5 seconds... finished.


=== Active Threads ========================================================================

 samples | avg_threads | comm       | state                  | syscall   | wchan
-------------------------------------------------------------------------------------------
      85 |        0.85 | (postgres) | Disk (Uninterruptible) | fdatasync | 0
      79 |        0.79 | (postgres) | Running (ON CPU)       | [running] | 0
      44 |        0.44 | (pgbench)  | Running (ON CPU)       | select    | core_sys_select
      43 |        0.43 | (pgbench)  | Running (ON CPU)       | [running] | 0
      40 |        0.40 | (pgbench)  | Running (ON CPU)       | [running] | core_sys_select
      39 |        0.39 | (postgres) | Running (ON CPU)       | [running] | wait_woken
      37 |        0.37 | (postgres) | Running (ON CPU)       | recvfrom  | 0
      20 |        0.20 | (postgres) | Disk (Uninterruptible) | fsync     | 0
      11 |        0.11 | (pgbench)  | Running (ON CPU)       | select    | 0
       9 |        0.09 | (postgres) | Running (ON CPU)       | [running] | do_syscall_64
       9 |        0.09 | (postgres) | Running (ON CPU)       | recvfrom  | wait_woken
       9 |        0.09 | (postgres) | Running (ON CPU)       | semop     | wait_woken
       5 |        0.05 | (postgres) | Running (ON CPU)       | fdatasync | wait_woken
       5 |        0.05 | (postgres) | Running (ON CPU)       | semop     | 0
       4 |        0.04 | (postgres) | Running (ON CPU)       | semop     | do_syscall_64
       3 |        0.03 | (postgres) | Running (ON CPU)       | recvfrom  | do_syscall_64
       2 |        0.02 | (pgbench)  | Running (ON CPU)       | futex     | 0
       2 |        0.02 | (pgbench)  | Running (ON CPU)       | futex     | core_sys_select
       1 |        0.01 | (postgres) | Disk (Uninterruptible) | [running] | 0
       1 |        0.01 | (postgres) | Disk (Uninterruptible) | read      | 0
       1 |        0.01 | (rngd)     | Running (ON CPU)       | poll      | SyS_poll


samples: 100 (expected: 100)
total processes: 165, threads: 216
runtime: 5.04, measure time: 4.04
  • dd を流しながらモニタリングしてみる
# dd を実行
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ sudo dd if=/dev/xvda1 of=/dev/null bs=1M count=10000 &
# pSnapper でサンプリングする
$ sudo psn -G syscall,wchan,kstack

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, wchan, syscall, stack for 5 seconds... finished.


=== Active Threads =========================================================================================================================

 samples | avg_threads | comm | state                  | syscall   | wchan | kstack
--------------------------------------------------------------------------------------------------------------------------------------------
      87 |        0.87 | (dd) | Disk (Uninterruptible) | read      | 0     | SyS_read()->vfs_read()->__vfs_read()->generic_file_read_iter()
       5 |        0.05 | (dd) | Running (ON CPU)       | read      | 0     | SyS_read()->vfs_read()->__vfs_read()->generic_file_read_iter()
       4 |        0.04 | (dd) | Disk (Uninterruptible) | [running] | 0     | -
       3 |        0.03 | (dd) | Disk (Uninterruptible) | read      | 0     | -
       1 |        0.01 | (dd) | Disk (Uninterruptible) | [running] | 0     | SyS_read()->vfs_read()->__vfs_read()->generic_file_read_iter()


samples: 100 (expected: 100)
total processes: 150, threads: 194
runtime: 5.02, measure time: 3.48

解説

f:id:yohei-a:20201221225838p:plain
f:id:yohei-a:20201221225922p:plain

  • 0xtools のツール群の1つ

f:id:yohei-a:20201221230055p:plain

  • Thread State Analysis を行うツール、Snapper、Oracle Database の ASH、Performance Insights なども同じアプローチ

f:id:yohei-a:20201221230154p:plain
f:id:yohei-a:20201221230327p:plain

  • Thread State は R(On-CPU or ランキュー待ち)、D(割込み不可のスリープ)、S(スリープ) など

f:id:yohei-a:20201221230235p:plain

  • B.Gregg のサイトや書籍でも TSA Method の解説がある

f:id:yohei-a:20201221230410p:plain

  • TSA と ASH は同じコンセプト、ASH は Wait Event を実装しているので、Thread State より細かくアプリケーションレベルで定義した状態別に計上することができる

f:id:yohei-a:20201221230459p:plain

  • ビジュアライズする機能もリリース予定?

f:id:yohei-a:20201221230800p:plain


詳しくは以下のスライドをご覧ください。