Profiled app with YourKit, still can't identify the CPU hog
I've got a Java application that is consuming 100% of the CPU most of开发者_如何转开发 the time (as indicated by cacti and top monitoring). We fired up YourKit (which confirms the CPU resource issue) and it identifies java.net.SocketInputStream.read(byte[], int, int) as the biggest hot spot at 15% of time. I believe they aren't accurately measuring CPU time for methods that perform blocking IO like SocketInputStream.read would.
There are 6 other identified hot spots, but they account for less than 20% of accounted for CPU time combined. all in the 5%-1% range.
So I know I have a problem, I can see the problem, YourKit does too, but I am no closer to identifying the actual problem.
I am pretty new to using a profiler, and am most likely missing something. Any ideas?
EDIT: Sean makes a good point about using tools built into the system. If I use top and shift+h to view threads, it displays anywhere from 7-15 threads, and the CPU utilization jumps around. I don't believe it's any one thread that is causing the problem, rather it is a piece of code each thread executes at some time.
I would recommend running this on a Solaris box if you can. If you don't have a Solaris box consider setting a Virtual Machine up with Open Solaris running on it.
Solaris offers a tool called prstat
Prstat works much like top which most people are familiar with. The important difference is prstat can break the processes up for you and show each thread within a process.
For your case the usage would be prstat -L 0 1
Paired with a thread dump (doing this in a script is preferred) you can match the LWPID together to find exactly which thread is the CPU hog.
Here is a functional example (I created a small app going in a big loop for poc)
Standard Top will show you something like the following
PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
924 username 10 59 0 31M 11M run 0:53 36.02% java
Then using prstat The following command was used
prstat -L 0 1 | grep java > /export/home/username/Desktop/output.txt
And the output from prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
924 username 31M 10M run 30 0 0:00:09 35% java/10
924 username 31M 10M sleep 59 0 0:00:00 0.8% java/3
924 username 31M 10M sleep 59 0 0:00:00 0.6% java/2
924 username 31M 10M sleep 59 0 0:00:00 0.3% java/1
This may not look much different then top, but if you notice to the right side of the data, the PROCESS/LWPID is telling you the exact thread within the java process which is consuming the CPU. the thread running with the light weight process id (lwpid) 10 is consuming 35% of the CPU. As I mentioned before, if you pair this with a thread dump, you can find the exact thread. For my case, this is the relevant portion of the thread dump
"Thread-0" prio=3 tid=0x08173800 nid=0xa runnable [0xc60fc000..0xc60fcae0]
java.lang.Thread.State: RUNNABLE
at java.util.Random.next(Random.java:139)
at java.util.Random.nextInt(Random.java:189)
at ConsumerThread.run(ConsumerThread.java:13)
On the top line of the thread, the nid can be matched to the LWPID. nid=0xa (which is 10 in dec when converted from Hex)
If you can put the prstat and thread dump commands in a script and run it 4-5 times during high CPU usages you will begin to see patterns and able to determine the cause of your high CPU that way.
In my time, I have seen this result from long running gc times to a misconfiguration of an LDAP connection. Have fun :)
You might have uniformly slow code?
The jvisualvm profiler is a convenient alternative for comparison; it's included with the JDK.
I would turn on memory tracing AND cpu profiling and look at the cpu profiler again. This will show up different areas to optimise.
When you say its using 100% of a CPU can you look at whether it is in the user or system/kernel space. e.g. top. The profiler won't show you cpu used in the kernel space.
How many threads do you have? If you have enough idle threads, you can have more than 100% CPU just switching between them. (You would have to have many thousands)
Like similar answers, it is quite likely you application has so much overhead e.g. reading socket, swapping between threads, performing GCs that its not doing much real work. The profilers aren't so good at picking up overhead.
One possibility is that your JVM doesn't have enough memory, so it's constantly doing GC.
精彩评论