Delphi SampleProfiler: How is this code calling into ntdll.dll?
i profiled a portion of my application using the Delphi Sampling Profiler. Like most people, i see a majority of the time spent inside ntdll.dll
.
Note: i turned on the options to ignore
Application.Idle
time, and calls fromSystem.pas
. So it isn't insidentdll
because the application is idle:
After multiple runs, multiple times, the majority of the time seems to be spent inside ntdll.dll
, but the odd thing is who the caller is:
The caller is from the Virtual Treeview's:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
Note: The application is not inside
ntdll.dll
because the application is idle, because the caller isn'tApplication.Idle
.
What confuses me is that it's this line itself (i.e. not something inside PrepareCell) is the caller into ntdll
. Even more confusing is that:
- not only is it not something inside
PrepareCell()
- it's not even the setup of
PrepareCell
(e.g. popping stack variables, setting up implicit exception frames, etc) that is the caller. Those things would show up in the profiler as a hotspot on thebegin
inside PrepareCell.
VirtualTrees.pas:
procedure TBaseVirtualTree.PrepareCell(var PaintInfo: TVTPaintInfo; WindowOrgX, MaxWidth: Integer);
begin
...
end;
So i'm trying to figure out how this line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
is calling ntdll.dll
.
The only other ways in are the three parameters:
PaintInfo
Window.Left
NodeBitmap.Width
Maybe one of those is a function, or a property getter, that would call into ntdll
. So i put a breakpoint on the line, and look at the CPU window at runtime:
There is a line in there that might be the culprit:
call dword ptr [edx+$2c]
But when i follow that jump, it doesn't end up in ntdll.dll
, but TBitmap.GetWidth
:
Which, as you can see, doesn't call anywhere; and certainly not into ntdll.dll
.
So how is the line:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
calling into ntdll.dll
?
Note: i know full well it isn't really calling into ntdll.dll. So any valid answer will have to include the words "Sampling Profiler is misleading; that line is not calling into ntdll.dll." The answer will also have to either say that the majority of the time is not spent in ntdll.dll, or that the highlighted line is not the caller. Finally any answer will have to explain why Sampling Profiler is wrong, and how it can be fixed.
Update 2
What is ntdll.dll? Ntdll is Windows NT's native API set. The Win32 API is a wrapper around ntdll.dll
that looks like the Windows API that existed in Windows 1/2/3/9x. In order to actually get into ntdll you have to call a function that uses ntdll directly or indirectly.
For example, when my Delphi application goes idle, it waits for a message by calling the user32.dll function:
WaitMessage;
When when you actually look at it is:
USER32.WaitMessage
mov eax,$00001226
mov edx,$7ffe0300
call dword ptr [edx]
ret
Calling the function specified at $7ffe0300
is the way Windows transitions into Ring0, calling the FunctionID specified in EAX. In this case, the System Function being called is 0x1226. On my operating system, Windows Vista, 0x1226 corresponds to the system function NtUserWaitMessage
.
This is how to you get into ntdll.dll: you call it.
i was desperately trying to avoid a hand-waving non-answer when i worded the original question. By being very specific, carefully pointing out the reality of what i'm seeing, i was trying to prevent people from ignoring the facts, and trying to use a hand-waving argument.
Update Three
i converted the two parameters:
PrepareCell(PaintInfo, Window.Left, NodeBitmap.Width);
into stack variables:
_profiler_WindowLeft := Window.Left;
_profiler_NodeBitmapWidth := NodeBitmap.Width;
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
To confirm that the bottleneck is not is the call to
Windows.Left
, or- Nodebitmap.Width
Profiler still indicates that the line
PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);
itself is the bottleneck; not anything inside PrepareCell. This must mean that it's something inside the setup of the call to prepare cell, or at the start of PrepareCell:
VirtualTrees.pas.15746: PrepareCell(PaintInfo, _profiler_WindowLeft, _profiler_NodeBitmapWidth);开发者_StackOverflow社区
mov eax,[ebp-$54]
push eax
mov edx,esi
mov ecx,[ebp-$50]
mov eax,[ebp-$04]
call TBasevirtualTree.PrepareCell
Nothing in that calls into ntdll. Now the pre-amble in PrepareCell itself:
VirtualTrees.pas.15746: begin
push ebp
mov ebp,esp
add esp,-$44
push ebx
push esi
push edi
mov [ebp-$14],ecx
mov [ebp-$18],edx
mov [ebp-$1c],eax
lea esi,[ebp-$1c]
mov edi,[ebp-$18]
Nothing in there calls into ntdll.dll
.
The questions still remain:
- why is pushing of one variable onto the stack, and two others into registers the bottleneck?
- why isn't anything inside PrepareCell itself the bottleneck?
Well, this problem was actually my main reason to make my own sampling profiler:
http://code.google.com/p/asmprofiler/wiki/AsmProfilerSamplingMode
Maybe not perfect, but you could give it a try. Let me know what you think about it.
Btw, I think it has to do with the fact that almost all calls ends into calls to the kernel (memory requests, paint events, etc). Only calculations do not need to call the kernel. Most calls ends in waiting for kernel results:
ntdll.dll!KiFastSystemCallRet
You can see this in Process Explorer with thread stack view, or in Delphi, or using StackWalk64 API in my "Live view" of AsmProfiler:
http://code.google.com/p/asmprofiler/wiki/ProcessStackViewer
There are probably two things happening there.
The first is that SamplingProfiler identifies the caller by walking up the stack, until it encounters what looks like a valid call point into Delphi from Delphi code.
The thing is, some procedures may reserve a large amount of stack at once, without reinitializing it. This could result in a false positive. The only clue then would be that your false positive was recently invoked.
The second thing is the ntdll
localization, that is known for certain, however, ntdll is your wait point in user-space, and as user197220, ntdll is where you'll end up waiting most of the time you're calling system stuff and waiting for the result.
In your case, unless you reduced the sampling rate, you're looking at 247ms of CPU work time, which could probably pass as idle if those 247 samples were collected over many seconds of real time. Since the false positive points to VirtualTree paint preparations, my bet would be that the ntdll time is actually paint time (driver or OS software). You can try commenting out the code that actually does the painting to be sure.
精彩评论