How to debug: w3wp.exe process was terminated due to a stack overflow (works on one machine but not another)
The problem
I have an ASP.NET 4.0 app开发者_运维技巧lication that crashes with a stack overflow on one computer, but not another. It runs fine on my development environment. When I move the site to the production server, it throws a stack overflow exception (seen in event log) and the w3wp.exe worker process dies and is replaced with another.What I've tried so far
For reference, I used the debug diagnostic tool to try to determine what piece of code is causing the overflow, but I'm not sure how to interpret the output of it. The output is included below.How might an ASP.NET website cause a stack overflow on one machine but not on another?
Experienced leads are appreciated. I'll post the resulting solution below the answer that leads me to it.Debug Output
Application: w3wp.exe Framework Version: v4.0.30319 Description: The process was terminated due to stack overflow.
In w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp the assembly instruction at nlssorting!SortGetSortKey+25 in C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll from Microsoft Corporation has caused a stack overflow exception (0xC00000FD) when trying to write to memory location 0x01d12fc0 on thread 16
Please follow up with the vendor Microsoft Corporation for C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll
Information:DebugDiag determined that this dump file (w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp) is a crash dump and did not perform any hang analysis. If you wish to enable combined crash and hang analysis for crash dumps, edit the IISAnalysis.asp script (located in the DebugDiag\Scripts folder) and set the g_DoCombinedAnalysis constant to True.
Entry point clr!ThreadpoolMgr::intermediateThreadProc
Create time 2/18/2011 9:07:10 PM
Function Arg 1 Arg 2 Arg 3 Source
nlssorting!SortGetSortKey+25 01115a98 00000001 0651a88c
clr!SortVersioning::SortDllGetSortKey+3b 01115a98 08000001 0651a88c
clr!COMNlsInfo::InternalGetGlobalizedHashCode+f0 01115a98 05e90268 0651a88c
mscorlib_ni+2becff 08000001 0000000f 0651a884
mscorlib_ni+255c10 00000001 09ed57bc 01d14348
mscorlib_ni+255bc4 79b29e90 01d14350 79b39ab0
mscorlib_ni+2a9eb8 01d14364 79b39a53 000dbb78
mscorlib_ni+2b9ab0 000dbb78 09ed57bc 01ff39f4
mscorlib_ni+2b9a53 01d14398 01d1439c 00000011
mscorlib_ni+2b9948 0651a884 01d143ec 7a97bf5d
System_ni+15bd65 6785b114 00000000 09ed5748
System_ni+15bf5d 1c5ab292 1b3c01dc 05ebc494
System_Web_ni+6fb165
***These lines below are repeated many times in the log, so I just posted one block of them
1c5a928c 00000000 0627e880 000192ba
1c5a9dce 00000000 0627e7c4 00000000
1c5a93ce 1b3c01dc 05ebc494 1b3c01dc
1c5a92e2
.....(repeated sequence from above)
System_Web_ni+16779c 1b338528 00000003 0629b7a0
System_Web_ni+1677fb 00000000 00000017 0629ac3c
System_Web_ni+167843 00000000 00000003 0629ab78
System_Web_ni+167843 00000000 00000005 0629963c
System_Web_ni+167843 00000000 00000001 0627e290
System_Web_ni+167843 00000000 0627e290 1a813508
System_Web_ni+167843 01d4f21c 79141c49 79141c5c
System_Web_ni+1651c0 00000001 0627e290 00000000
System_Web_ni+16478d 00000001 01ea7730 01ea76dc
System_Web_ni+1646af 0627e290 01d4f4c0 672c43f2
System_Web_ni+164646 00000000 06273aa8 0627e290
System_Web_ni+1643f2 672d1b65 06273aa8 00000000
1c5a41b5 00000000 01d4f520 06273aa8
System_Web_ni+18610c 01d4f55c 0df2a42c 06273f14
System_Web_ni+19c0fe 01d4fa08 0df2a42c 06273e5c
System_Web_ni+152ccd 06273aa8 05e9f214 06273aa8
System_Web_ni+19a8e2 05e973b4 062736cc 01d4f65c
System_Web_ni+19a62d 06a21c6c 79145d80 01d4f7fc
System_Web_ni+199c2d 00000002 672695e8 00000000
System_Web_ni+7b65cc 01d4fa28 00000002 01c52c0c
clr!COMToCLRDispatchHelper+28 679165b0 672695e8 09ee2038
clr!BaseWrapper<Stub *,FunctionBase<Stub *,&DoNothing<Stub *>,&StubRelease<Stub>,2>,0,&CompareDefault<Stub *>,2>::~BaseWrapper<Stub *,FunctionBase<Stub *,&DoNothing<Stub *>,&StubRelease<Stub>,2>,0,&CompareDefault<Stub *>,2>+fa 672695e8 09ee2038 00000001
clr!COMToCLRWorkerBody+b4 000dbb78 01d4f9f8 1a78ffe0
clr!COMToCLRWorkerDebuggerWrapper+34 000dbb78 01d4f9f8 1a78ffe0
clr!COMToCLRWorker+614 000dbb78 01d4f9f8 06a21c6c
1dda1aa 00000001 01b6c7a8 00000000
webengine4!HttpCompletion::ProcessRequestInManagedCode+1cd 01b6c7a8 69f1aa72 01d4fd6c
webengine4!HttpCompletion::ProcessCompletion+4a 01b6c7a8 00000000 00000000
webengine4!CorThreadPoolWorkitemCallback+1c 01b6c7a8 0636a718 0000ffff
clr!UnManagedPerAppDomainTPCount::DispatchWorkItem+195 01d4fe1f 01d4fe1e 0636a488
clr!ThreadpoolMgr::NewWorkerThreadStart+20b 00000000 0636a430 00000000
clr!ThreadpoolMgr::WorkerThreadStart+3d1 00000000 00000000 00000000
clr!ThreadpoolMgr::intermediateThreadProc+4b 000c3470 00000000 00000000
kernel32!BaseThreadStart+34 792b0b2b 000c3470 00000000
NLSSORTING!SORTGETSORTKEY+25In w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp the assembly instruction at nlssorting!SortGetSortKey+25 in C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll from Microsoft Corporation has caused a stack overflow exception (0xC00000FD) when trying to write to memory location 0x01d12fc0 on thread 16
This question is a bit old, but I just found a nice way of getting the stack trace of my application just before overflowing and I would like share it with other googlers out there:
When your ASP.NET app crashes, a set of debugging files are dumped in a "crash folder" inside this main folder:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue
These files can be analysed using WinDbg, which you can download from one of the links below:
- Windows WinDbg x86 installer
- Windows WinDbg x64 installer
After installing it in the same machine where your app crashed, click File > Open Crash Dump and select the largest .tmp file in your "crash folder" (mine had 180 MB). Something like:
AppCrash_w3wp.exe_3d6ded0d29abf2144c567e08f6b23316ff3a7_cab_849897b9\WER688D.tmp
Then, run the following commands in the command window that just opened:
.loadby sos clr !clrstack
Finally, the generated output will contain your app stack trace just before overflowing, and you can easily track down what caused the overflow. In my case it was a buggy logging method:
000000dea63aed30 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception) 000000dea63aedd0 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception) 000000dea63aee70 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception) 000000dea63aef10 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception) 000000dea63aefb0 000007fd88de9d00 Library.Logging.RepositoryLogger.Error(System.Object, System.Exception) 000000dea63af040 000007fd88de9ba0 Library.WebServices.ErrorLogger.ProvideFault(System.Exception, System.ServiceModel.Channels.MessageVersion, System.ServiceModel.Channels.Message ByRef)
Thanks to Paul White and his blog post: Debugging Faulting Application w3wp.exe Crashes
A default stack limit for w3wp.exe is a joke. I always raise it with editbin /stack:9000000 w3wp.exe
, it should be sufficient. Get rid of your stack overflow first, and then debug whatever you want.
Get a crash dump, run it against Microsoft's Debug Diagnostic Tool and show us the result.
Also take a look at http://support.microsoft.com/kb/919789/en-us, which explains all the necessary steps in detail.
Two things I would try before analysing any memory dumps.
- Install the remote debugging tool on the web server and try debugging that way. You can find this tool on the Visual Studio install DVD.
- Install Elmah. Elmah can be added to a running ASP.NET application for logging and debugging. I would probably go with this option first and it's the least painful approach. http://code.google.com/p/elmah/
One possibility for your application behaving differently in production vs development could be preprocessor directives like #if DEBUG
in the code. When you deploy to production the release build would have different code segments than your debug build.
Another option would be that your application is throwing an unrelated exception in production. And the error handling code somehow ends up in an infinite function calling loop. You may want to look for an infinite loop that has a function call to itself or another function that calls this function back. This ends up in an infinite function callig loop because of the infinite for or while loop. I apologize for going overboard with the word 'infinite'.
It's also happened to me before when I accidentally created a property and returned the property inside my property. Like:
public string SomeProperty { get { return SomeProperty; } }
Also, if possible you could do special stuff with the exception in the Application_error
function of your global.asax. Use server.getlasterror()
to get the exception and log/display the stack trace. You may want to do the same for any innerexception
s or innerexception
s of innerexception
s and so on.
You may already be doing the above mentioned things but I wanted to mention them just in case.
Also, from your trace it looks like the error is happening in GetSortKey
. Is that a function in your code? If so, then your infinite self calling may start there.
Hope this helps.
精彩评论