ablog

不器用で落着きのない技術者のメモ

xargs -P vs split -n r/ の性能比較

ちょっとだけ調べたのでメモ。

実行時間(/usr/bin/time -p)

split は xargs より 2.6倍速い(xargs: 4.30秒、split: 1.66秒)

  • xargs -P
$ seq 10e3 | /usr/bin/time -p xargs -I{} -P4 sh -c '/bin/true {}'                      
real 4.30 ★
user 4.37
sys 6.24
  • split -n r/
$ seq 10e3 | /usr/bin/time -p ./split -u -n r/4 --filter 'while read i; do /bin/true $i; done'
real 1.66 ★
user 0.26
sys 0.89

システムコール所要時間・発行回数(strace -c)

  • wait4 + clone システムコールで split は7.5倍速い(xargs: 1.8秒、split: 0.2秒)。
  • wait4、clone システムコールのコール回数は split は4回だが、xargs は1万回以上と大きな差がある。
  • このケースでは多くの引数をパイプで渡しているが、CPU数分程度*1の引数を渡して、xargs からコマンドが実行し、時間のかかる処理をするケースだと子プロセス生成回数が少なく、ここまでの性能差は出ないと思う。
  • xargs -P
$ seq 10e3 | strace -c xargs -I{} -P4 sh -c '/bin/true {}'                 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.25    1.790977          63     28561           wait4 ★
  0.75    0.013581           1     10703       703 clone ★
  0.00    0.000000           0        14           read
  0.00    0.000000           0         4           open
  0.00    0.000000           0         7           close
  0.00    0.000000           0         5           fstat
  0.00    0.000000           0        14           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         2           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    1.804558                 39322       704 total
  • split -n r/
$ seq 10e3 | strace -c ./split -u -n r/4 --filter 'while read i; do /bin/true $i; done'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.94    0.248840       62210         4           wait4 ★
  0.04    0.000090           0     10001           write
  0.03    0.000070          18         4           clone ★
  0.00    0.000000           0         4           read
  0.00    0.000000           0         3           open
  0.00    0.000000           0        14           close
  0.00    0.000000           0         8           fstat
  0.00    0.000000           0         4         4 lseek
  0.00    0.000000           0        14           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         5           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         4           pipe
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         8           fcntl
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.249000                 10084         5 total

生 strace で雰囲気を見てみる(strace -Tttf)

seq 10e3 | strace -o strace.log -Tttf xargs -I{} -P4 sh -c '/bin/true {}'
$ less strace.log
22098 23:40:53.530545 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff9c100e9d0) = 22099 <0.000052>
22099 23:40:53.530662 close(0 <unfinished ...>
22098 23:40:53.530701 wait4(-1,  <unfinished ...>
22099 23:40:53.530746 <... close resumed> ) = 0 <0.000048>
22098 23:40:53.530784 <... wait4 resumed> 0x7fff8cb7ae9c, WNOHANG, NULL) = 0 <0.000042>
22099 23:40:53.530821 open("/dev/null", O_RDONLY <unfinished ...>
22098 23:40:53.530861 clone( <unfinished ...>
22099 23:40:53.530895 <... open resumed> ) = 0 <0.000037>
22099 23:40:53.530973 execve("/usr/lib64/qt-3.3/bin/sh", ["sh", "-c", "/bin/true 1"], [/* 86 vars */] <unfinished ...>
22098 23:40:53.531059 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff9c100e9d0) = 22100 <0.000166>
22100 23:40:53.531123 close(0 <unfinished ...>
22099 23:40:53.531168 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000100>
22100 23:40:53.531213 <... close resumed> ) = 0 <0.000056>
22098 23:40:53.531266 wait4(-1,  <unfinished ...>
22100 23:40:53.531303 open("/dev/null", O_RDONLY <unfinished ...>
22099 23:40:53.531352 execve("/usr/local/bin/sh", ["sh", "-c", "/bin/true 1"], [/* 86 vars */] <unfinished ...>
22100 23:40:53.531428 <... open resumed> ) = 0 <0.000078>
22098 23:40:53.531466 <... wait4 resumed> 0x7fff8cb7ae9c, WNOHANG, NULL) = 0 <0.000167>
22100 23:40:53.531503 execve("/usr/lib64/qt-3.3/bin/sh", ["sh", "-c", "/bin/true 2"], [/* 86 vars */] <unfinished ...>
22099 23:40:53.531581 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000157>
22100 23:40:53.531617 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000037>
22098 23:40:53.531656 clone( <unfinished ...>
22100 23:40:53.531691 execve("/usr/local/bin/sh", ["sh", "-c", "/bin/true 2"], [/* 86 vars */] <unfinished ...>
22099 23:40:53.531808 execve("/usr/bin/sh", ["sh", "-c", "/bin/true 1"], [/* 86 vars */] <unfinished ...>
22100 23:40:53.531885 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000079>
22099 23:40:53.531929 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000048>
22100 23:40:53.531968 execve("/usr/bin/sh", ["sh", "-c", "/bin/true 2"], [/* 86 vars */] <unfinished ...>
22101 23:40:53.532048 close(0 <unfinished ...>
22099 23:40:53.532089 execve("/bin/sh", ["sh", "-c", "/bin/true 1"], [/* 86 vars */] <unfinished ...>
22098 23:40:53.532164 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff9c100e9d0) = 22101 <0.000477>
22101 23:40:53.532214 <... close resumed> ) = 0 <0.000127>
22100 23:40:53.532265 <... execve resumed> ) = -1 ENOENT (No such file or directory) <0.000221>
22101 23:40:53.532307 open("/dev/null", O_RDONLY <unfinished ...>
22100 23:40:53.532348 execve("/bin/sh", ["sh", "-c", "/bin/true 2"], [/* 86 vars */] <unfinished ...>
22098 23:40:53.532416 wait4(-1,  <unfinished ...>
22101 23:40:53.532450 <... open resumed> ) = 0 <0.000106>
22099 23:40:53.532479 <... execve resumed> ) = 0 <0.000317>

環境

  • Oracle Linux 6.6
  • 4 CPU(2 Core * 2 Thread) CPU: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
$ uname -r
2.6.39-400.17.1.el6uek.x86_64

$ cat /etc/issue
Oracle Linux Server release 6.6
Kernel \r on an \m

$ xargs --version
xargs (GNU findutils) 4.4.2
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Built using GNU gnulib version e5573b1bad88bfabcda181b9e0125fb0c52b7d3b

$ ./split --version
split (GNU coreutils) 8.23
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbj&#246;rn Granlund and Richard M. Stallman.

補足

split の 8.4 は --filter オプションが使えなかったので、ソースを入手してコンパイルしました。

$ wget http://ftp.gnu.org/gnu/coreutils/coreutils-8.23.tar.xz
$ tar xfvJ coreutils-8.23.tar.xz
$ cd coreutils-8.23
$ ./configure
$ make
$ cd src
$ ./split --version
split (GNU coreutils) 8.23
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbj&#246;rn Granlund and Richard M. Stallman.

*1:使用するマシンでCPUを使って並列処理できる程度の数。他にCPUバウンドな処理は実行されていない前提。