素敵なサイト
ただのメモ
- OpenJDK をダウンロードする
http://download.java.net/openjdk/jdk7/ から openjdk-7-fcs-src-b147-27_jun_2011.zip をダウンロードする。
- 解凍する
$ unzip openjdk-7-fcs-src-b147-27_jun_2011.zip
- 覗いてみる
$ cd openjdk $ tree -d -L 2 . ├── corba │   ├── make │   └── src ├── hotspot │   ├── agent │   ├── make │   ├── src │   └── test ├── jaxp │   ├── make │   ├── nbproject │   └── patches ├── jaxws │   ├── make │   ├── nbproject │   └── patches ├── jdk │   ├── make │   ├── src │   └── test ├── langtools │   ├── make │   ├── src │   └── test ├── make │   ├── scripts │   └── templates └── test 28 directories $ ack pthread_cond_wait * hotspot/src/os/linux/vm/os_linux.cpp 4852:// a timeout request to the chron thread and then blocking via pthread_cond_wait(). 4933: status = pthread_cond_wait(_cond, _mutex); 5194: status = pthread_cond_wait (_cond, _mutex) ; hotspot/src/os/linux/vm/jsig.c 71: pthread_cond_wait(&cond, &mutex); hotspot/src/os/solaris/vm/os_solaris.cpp 4810: os::Solaris::set_cond_wait(CAST_TO_FN_PTR(int_fnP_cond_tP_mutex_tP, resolve_symbol("pthread_cond_wait"))); $ ack pthread_cond_timedwait * hotspot/src/os/linux/vm/os_linux.cpp 4303: return pthread_cond_timedwait(_cond, _mutex, _abstime); 4306: // 6292965: LinuxThreads pthread_cond_timedwait() resets FPU control 4311: int status = pthread_cond_timedwait(_cond, _mutex, _abstime); 4823:// Beware -- Some versions of NPTL embody a flaw where pthread_cond_timedwait() can 4827:// Briefly, pthread_cond_timedwait() calls with an expiry time that's not in the future 4832:// calls to pthread_cond_timedwait(). Using LD_ASSUME_KERNEL to use an older version 4839:// between the call to compute_abstime() and pthread_cond_timedwait(), more 4850:// 3. Embargo pthread_cond_timedwait() and implement a native "chron" thread 4851:// that manages timeouts. We'd emulate pthread_cond_timedwait() by enqueuing 4854:// on certain platforms that don't handle lots of active pthread_cond_timedwait() 4858:// correctly from pthread_cond_timedwait(), but the condvar is left corrupt. 4861:// followed by cond_init() -- after all calls to pthread_cond_timedwait(). 4863:// value from pthread_cond_timedwait(). In addition to reinitializing the hotspot/src/os/solaris/vm/os_solaris.cpp 4809: os::Solaris::set_cond_timedwait(CAST_TO_FN_PTR(int_fnP_cond_tP_mutex_tP_timestruc_tP, resolve_symbol("pthread_cond_timedwait"))); hotspot/src/share/vm/runtime/globals.hpp 1073: " avoid NPTL-FUTEX hang pthread_cond_timedwait" ) \
参考
Linux 上での実装について
コメントによると, NPTL で pthread_cond_timedwait() がバグっていた時期があったらしい (永久にハングしてしまうんだとか).
以下の 4種類の解決策が考えられるが, 現在採用しているのは "4.". (WorkAroundNPTLTimedWaitHang オプションを参照).
(<= なおコメント中の WorkAroundNTPLTimedWaitHang は typo(TとPが逆))なお, 「使用しているカーネルや glibc のバージョンを確認して, バグっているバージョンの場合だけ対処する最適化も考えられる」とのこと.
((cite: hotspot/src/os/linux/vm/os_linux.cpp)) // Refer to the comments in os_solaris.cpp park-unpark. // // Beware -- Some versions of NPTL embody a flaw where pthread_cond_timedwait() can // hang indefinitely. For instance NPTL 0.60 on 2.4.21-4ELsmp is vulnerable. // For specifics regarding the bug see GLIBC BUGID 261237 : // http://www.mail-archive.com/debian-glibc@lists.debian.org/msg10837.html. // Briefly, pthread_cond_timedwait() calls with an expiry time that's not in the future // will either hang or corrupt the condvar, resulting in subsequent hangs if the condvar // is used. (The simple C test-case provided in the GLIBC bug report manifests the // hang). The JVM is vulernable via sleep(), Object.wait(timo), LockSupport.parkNanos() // and monitorenter when we're using 1-0 locking. All those operations may result in // calls to pthread_cond_timedwait(). Using LD_ASSUME_KERNEL to use an older version // of libpthread avoids the problem, but isn't practical. // // Possible remedies: // // 1. Establish a minimum relative wait time. 50 to 100 msecs seems to work. // This is palliative and probabilistic, however. If the thread is preempted // between the call to compute_abstime() and pthread_cond_timedwait(), more // than the minimum period may have passed, and the abstime may be stale (in the // past) resultin in a hang. Using this technique reduces the odds of a hang // but the JVM is still vulnerable, particularly on heavily loaded systems. // // 2. Modify park-unpark to use per-thread (per ParkEvent) pipe-pairs instead // of the usual flag-condvar-mutex idiom. The write side of the pipe is set // NDELAY. unpark() reduces to write(), park() reduces to read() and park(timo) // reduces to poll()+read(). This works well, but consumes 2 FDs per extant // thread. // // 3. Embargo pthread_cond_timedwait() and implement a native "chron" thread // that manages timeouts. We'd emulate pthread_cond_timedwait() by enqueuing // a timeout request to the chron thread and then blocking via pthread_cond_wait(). // This also works well. In fact it avoids kernel-level scalability impediments // on certain platforms that don't handle lots of active pthread_cond_timedwait() // timers in a graceful fashion. // // 4. When the abstime value is in the past it appears that control returns // correctly from pthread_cond_timedwait(), but the condvar is left corrupt. // Subsequent timedwait/wait calls may hang indefinitely. Given that, we // can avoid the problem by reinitializing the condvar -- by cond_destroy() // followed by cond_init() -- after all calls to pthread_cond_timedwait(). // It may be possible to avoid reinitialization by checking the return // value from pthread_cond_timedwait(). In addition to reinitializing the // condvar we must establish the invariant that cond_signal() is only called // within critical sections protected by the adjunct mutex. This prevents // cond_signal() from "seeing" a condvar that's in the midst of being // reinitialized or that is corrupt. Sadly, this invariant obviates the // desirable signal-after-unlock optimization that avoids futile context switching. // // I'm also concerned that some versions of NTPL might allocate an auxilliary // structure when a condvar is used or initialized. cond_destroy() would // release the helper structure. Our reinitialize-after-timedwait fix // put excessive stress on malloc/free and locks protecting the c-heap. // // We currently use (4). See the WorkAroundNTPLTimedWaitHang flag. // It may be possible to refine (4) by checking the kernel and NTPL verisons // and only enabling the work-around for vulnerable environments.Thread の待機処理の枠組み : ParkEvent (os::PlatformEvent) による処理
ほう。
Package: libc6
Version: 2.3.2.ds1-13
Severity: normalpthread_cond_timedwait with a timespec in the past sometimes destroys the pthread_cond_t variable. The following pthread_cond_(timed)wait calls ignore all signals on the condition variable and block.
There is no Workaround, because I cannot guarantee that the timespec is really in the future (because the thread may be interrupted between gettimeofday and pthread_cond_timedwait).
The problem only occurs with NPTL (i.e. I can't reproduce the problem with LD_ASSUME_KERNEL=2.4.19).
The following program deadlocks after some iterations (10-1000).
*** /tmp/current/bug.c // gcc -pthread -Wall -O2 -std=c99 bug.c -o bug #include <stdio.h> #include <pthread.h> pthread_mutex_t mutex; pthread_cond_t empty; pthread_cond_t nonempty; unsigned count = 0; void* run(void* env) { pthread_mutex_lock(&mutex); struct timespec ts = { 0, 0 }; for (;;) { count = 0; pthread_cond_signal(&empty); pthread_cond_timedwait(&nonempty, &mutex, &ts); count = 0; pthread_cond_signal(&empty); pthread_cond_wait(&nonempty, &mutex); } } int main() { pthread_mutex_init(&mutex, 0); pthread_cond_init(&empty, 0); pthread_cond_init(&nonempty, 0); pthread_t thread; pthread_create(&thread, 0, &run, 0); pthread_mutex_lock(&mutex); for (unsigned i = 0;; ++i) { while (count > 0) { pthread_cond_wait(&empty, &mutex); } ++count; pthread_cond_signal(&nonempty); printf("%u\n", i); } }
- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.7-1-686-smp
Locale: LANG=C, [EMAIL PROTECTED]Versions of packages libc6 depends on:
Bug#261237: NPTL: pthread_cond_timedwait with timespec in the past destroys cond variable
ii libdb1-compat 2.1.3-7 The Berkeley database routines [gl
むむむ。
- glibc_bug_261237.c
#include <stdio.h> #include <pthread.h> pthread_mutex_t mutex; pthread_cond_t empty; pthread_cond_t nonempty; unsigned count = 0; void* run(void* env) { pthread_mutex_lock(&mutex); struct timespec ts = { 0, 0 }; for (;;) { count = 0; pthread_cond_signal(&empty); pthread_cond_timedwait(&nonempty, &mutex, &ts); count = 0; pthread_cond_signal(&empty); pthread_cond_wait(&nonempty, &mutex); } } int main() { pthread_mutex_init(&mutex, 0); pthread_cond_init(&empty, 0); pthread_cond_init(&nonempty, 0); pthread_t thread; pthread_create(&thread, 0, &run, 0); pthread_mutex_lock(&mutex); for (unsigned i = 0;; ++i) { while (count > 0) { pthread_cond_wait(&empty, &mutex); } ++count; pthread_cond_signal(&nonempty); printf("%u\n", i); } }
glibc のバージョンを確認して
$ /lib/libc.so.6 GNU C Library stable release version 2.12, by Roland McGrath et al. Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.4.7 20120313 (Red Hat 4.4.7-4). Compiled on a Linux 2.6.32 system on 2013-11-20. Available extensions: The C stubs add-on version 2.1.2. crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B RT using linux kernel aio libc ABIs: UNIQUE IFUNC For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>.
コンパイルして
$ gcc -pthread -Wall -O2 -std=c99 glibc_bug_261237.c -o glibc_bug_261237
実行して、
$ ./glibc_bug_261237 > /dev/null & [1] 24235
まずは dstat で
$ dstat -tcf --top-cpu 5 ----system---- -------cpu0-usage--------------cpu1-usage--------------cpu2-usage--------------cpu3-usage------ -most-expensive- date/time |usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| cpu process 12-01 13:33:00| 18 21 60 0 0 0: 18 20 62 0 0 0: 15 27 58 0 0 0: 16 27 56 0 0 0|glibc_bug_2614.4 12-01 13:33:05| 5 12 83 0 0 0: 5 9 86 0 0 0: 10 50 39 0 0 1: 14 44 42 0 0 0|glibc_bug_261 30 12-01 13:33:10| 11 45 44 0 0 0: 13 39 48 0 0 0: 5 12 83 0 0 0: 4 12 84 0 0 0|glibc_bug_261 29 12-01 13:33:15| 12 52 36 0 0 1: 15 43 42 0 0 0: 4 9 86 0 0 0: 4 8 88 0 0 0|glibc_bug_261 29 12-01 13:33:20| 8 35 56 0 0 0: 10 29 61 0 0 0: 7 20 72 0 0 0: 6 22 72 0 0 0|glibc_bug_261 29
top で眺めてみる
$ top -Hc -b -d 5 -p 24235 top - 13:14:50 up 3:05, 3 users, load average: 1.42, 1.16, 1.06 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 19.6%us, 21.7%sy, 0.3%ni, 57.8%id, 0.4%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 16158544k total, 15997144k used, 161400k free, 225588k buffers Swap: 8085500k total, 0k used, 8085500k free, 5224448k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24235 yazekats 20 0 14464 316 248 R 63.9 0.0 7:24.87 ./glibc_bug_261237 24236 yazekats 20 0 14464 316 248 S 59.9 0.0 7:12.00 ./glibc_bug_261237 [...] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24235 yazekats 20 0 14464 316 248 R 60.6 0.0 7:27.90 ./glibc_bug_261237 24236 yazekats 20 0 14464 316 248 S 59.0 0.0 7:14.95 ./glibc_bug_261237 [...] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24235 yazekats 20 0 14464 316 248 R 61.9 0.0 7:31.00 ./glibc_bug_261237 24236 yazekats 20 0 14464 316 248 R 59.7 0.0 7:17.94 ./glibc_bug_261237
strace -ttT したり
$ strace -ttT -p 24235 13:17:59.132997 futex(0x600d24, FUTEX_WAIT_PRIVATE, 257305127, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000021> 13:17:59.133086 futex(0x600d64, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x600d60, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000025> 13:17:59.133180 futex(0x600ce0, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000025> 13:17:59.133270 futex(0x600d20, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000031> 13:17:59.133370 futex(0x600d24, FUTEX_WAIT_PRIVATE, 257305129, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000021> 13:17:59.133459 futex(0x600d64, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x600d60, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000024> 13:17:59.133552 futex(0x600ce0, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000024> 13:17:59.133642 futex(0x600d20, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000031> 13:17:59.133743 futex(0x600d24, FUTEX_WAIT_PRIVATE, 257305131, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000021> 13:17:59.133841 futex(0x600d64, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x600d60, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000025> 13:17:59.133936 futex(0x600ce0, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000026> 13:17:59.134027 futex(0x600d20, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000031> 13:17:59.134127 futex(0x600d24, FUTEX_WAIT_PRIVATE, 257305133, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000020> ^CProcess 24235 detached
strace -c したり
$ strace -c -p 24235 Process 24235 attached - interrupt to quit ^CProcess 24235 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.010504 0 49197 12280 futex 0.00 0.000000 0 30 write ------ ----------- ----------- --------- --------- ---------------- 100.00 0.010504 49227 12280 total
ltrace してみたり
$ ltrace -ttTSC -n 4 -p 4454 [pid 4455] 16:06:08.972344 pthread_cond_wait(0x600d60, 0x600ce0, 1, -1, 0x600d20 <unfinished ...> [pid 4454] 16:06:08.972523 <... SYS_futex resumed> ) = -512 <0.000157> [pid 4455] 16:06:08.972573 SYS_futex(0x600ce0, 129, 1, 1, 0x600d20 <unfinished ...> [pid 4454] 16:06:08.972642 SYS_futex(0x600ce0, 128, 2, 0, 0x600ce0 <unfinished ...> [pid 4455] 16:06:08.972738 <... SYS_futex resumed> ) = 0 <0.000077> [pid 4454] 16:06:08.972794 <... SYS_futex resumed> ) = -11 <0.000063> [pid 4455] 16:06:08.972829 SYS_futex(0x600d64, 128, 0x3dbc1ef, 0, 0x600c00 <unfinished ...> [pid 4454] 16:06:08.972931 <... pthread_cond_wait resumed> ) = 0 <0.001707> [pid 4454] 16:06:08.972969 pthread_cond_signal(0x600d60, 128, 0, -1, 0x600ce0 <unfinished ...> [pid 4455] 16:06:08.973149 <... SYS_futex resumed> ) = -512 <0.000226> [pid 4455] 16:06:08.973199 SYS_futex(0x600d64, 128, 0x3dbc1ef, 0, 0x600c00 <unfinished ...> [pid 4454] 16:06:08.973267 SYS_futex(0x600d64, 133, 1, 1, 0x600d60 <unfinished ...> [pid 4455] 16:06:08.973355 <... SYS_futex resumed> ) = -11 <0.000070> [pid 4454] 16:06:08.973412 <... SYS_futex resumed> ) = 0 <0.000063> [pid 4455] 16:06:08.973445 SYS_futex(0x600ce0, 128, 2, 0, 0x600ce0 <unfinished ...> [pid 4454] 16:06:08.973541 <... pthread_cond_signal resumed> ) = 0 <0.000471> [pid 4454] 16:06:08.973579 printf("%u\n", 32350362 <unfinished ...> [pid 4455] 16:06:08.973748 <... SYS_futex resumed> ) = ^C-512 <0.000214> $ ltrace -c -p 24235 ^C% time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 50.53 6.287200 2779 2262 pthread_cond_wait 22.54 2.803753 929 3017 pthread_cond_signal 19.67 2.447685 1623 1508 printf 7.26 0.902921 1197 754 pthread_cond_timedwait ------ ----------- ----------- --------- -------------------- 100.00 12.441559 7541 total
perf してみたり
$ sudo echo 0 > /proc/sys/kernel/perf_event_paranoid $ perf record -F 99 -p 24235 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.021 MB perf.data (~922 samples) ] $ perf report Samples: 305 of event 'cycles:HG', Event count (approx.): 2873553649 10.73% glibc_bug_26123 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore ★これを選択 8.22% glibc_bug_26123 [kernel.kallsyms] [k] try_to_wake_up 7.83% glibc_bug_26123 libc-2.12.so [.] vfprintf 3.89% glibc_bug_26123 [kernel.kallsyms] [k] native_write_msr_safe 3.88% glibc_bug_26123 [kernel.kallsyms] [k] __ticket_spin_lock 3.08% glibc_bug_26123 libpthread-2.12.so [.] pthread_cond_wait@@GLIBC_2.3.2 2.97% glibc_bug_26123 [kernel.kallsyms] [k] futex_wake_op 2.83% glibc_bug_26123 libc-2.12.so [.] free 2.79% glibc_bug_26123 [kernel.kallsyms] [k] system_call 2.68% glibc_bug_26123 [kernel.kallsyms] [k] system_call_after_swapgs 2.57% glibc_bug_26123 [kernel.kallsyms] [k] get_futex_key 2.37% glibc_bug_26123 [kernel.kallsyms] [k] __schedule 2.31% glibc_bug_26123 libc-2.12.so [.] _itoa_word 2.12% glibc_bug_26123 libpthread-2.12.so [.] pthread_cond_signal@@GLIBC_2.3.2 1.88% glibc_bug_26123 [kernel.kallsyms] [k] __x2apic_send_IPI_mask 1.84% glibc_bug_26123 [kernel.kallsyms] [k] futex_wait 1.77% glibc_bug_26123 libpthread-2.12.so [.] __lll_unlock_wake 1.76% glibc_bug_26123 [kernel.kallsyms] [k] select_task_rq_fair 1.58% glibc_bug_26123 libc-2.12.so [.] _IO_file_xsputn@@GLIBC_2.2.5 1.57% glibc_bug_26123 [kernel.kallsyms] [k] hash_futex 1.47% glibc_bug_26123 [kernel.kallsyms] [k] native_read_tsc 1.39% glibc_bug_26123 libc-2.12.so [.] __strchrnul 1.38% glibc_bug_26123 [kernel.kallsyms] [k] find_next_bit 1.28% glibc_bug_26123 [kernel.kallsyms] [k] wake_futex 1.25% glibc_bug_26123 [kernel.kallsyms] [k] perf_event_context_sched_out 1.25% glibc_bug_26123 [kernel.kallsyms] [k] dequeue_task 1.19% glibc_bug_26123 [kernel.kallsyms] [k] effective_load 1.15% glibc_bug_26123 [kernel.kallsyms] [k] wake_affine 1.11% glibc_bug_26123 [kernel.kallsyms] [k] dequeue_entity 1.03% glibc_bug_26123 [kernel.kallsyms] [k] cpumask_next_and 0.92% glibc_bug_26123 [kernel.kallsyms] [k] intel_pmu_enable_all 0.90% glibc_bug_26123 [kernel.kallsyms] [k] sys_futex 0.78% glibc_bug_26123 [kernel.kallsyms] [k] pick_next_task_idle 0.76% glibc_bug_26123 [kernel.kallsyms] [k] copy_user_generic_string 0.75% glibc_bug_26123 [kernel.kallsyms] [k] pick_next_task_fair Press '?' for help on key bindings
脱線