开发者

Multithreading, Multiprocessing with STOP and Continue Signals

I am working on a project where I need to get the native stack of the Java application. I am able to achieve this partially thanks to ptrace, multiprocessing, and signals.

On Linux, a normal Java application has, at a minimum, 14 threads. Out of these 14, I am interested in only the main thread of which I have to get the native stack. Considering this objective, I have started a separate process using fork() which is monitoring the native stack of the main thread. In short, I have 2 separate processes: one is being monitored and the other does the monitoring using ptrace and signal handling.

Steps in the monitoring process:

  1. Get the main thread ID out of the 14 threads from the monitored process.

  2. ptrace_att开发者_如何学Goach on the main ID.

  3. ptrace_cont on the main ID.

continuous loop starts

{

  1. kill(main_ID, SIGSTOP)

  2. nanosleep and check the status from the /proc/[pid]/stat directory.

  3. ptrace_peekdata to read the stack and navigate.

  4. ptrace_cont on the main ID.

  5. nanosleep and check the status from the /proc/[pid]/stat directory.

}

  1. ptrace_detach on the main ID.

This perfectly gives the native stack information continuously. However, sometimes I encounter an issue:

When I kill(main_ID, SIGSTOP) the main thread, the other threads from the process get into a finished or stoped state (T) and the entire process blocks. This is not the consistent behavior and sometimes entire process executes correctly. I cannot understand this behavior as i am only signaling the main thread. Why are the other threads affected?

Can someone help me analyze this problem?

I also tried sending SIGCONT and SIGSTOP to all of the threads of the process but the issue still occurs sometimes.

Thanks, Sandeep


Assuming you are using Linux, you should be using tkill(2) or tgkill(2) instead of kill(2). On FreeBSD, you should use the SYS_thr_kill2 syscall. Per the tkill(2) manpage:

tgkill() sends the signal sig to the thread with the thread ID tid in the thread group tgid. (By contrast, kill(2) can only be used to send a signal to a process (i.e., thread group) as a whole, and the signal will be delivered to an arbitrary thread within that process.)

Ignore the stuff about tkill(2) and friends being for internal thread library usage, it is commonly used by debuggers/tracers to send signals to specific threads.

Also, you should use waitpid(2) (or some variation of it) to wait for the thread to receive the SIGSTOP instead of polling on /proc/[pid]/stat. This approach will be more efficient and more responsive.

Finally, it appears that you are doing some sort of stack sampling. You may want to check out Google PerfTools as these tools include a CPU sampler that is doing stack sampling to obtain estimates of what functions are consuming the most CPU time. You could maybe reuse the work these tools have already done, as stack sampling can be tricky to make robust.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜