Processing big strings, Is this Large Object Heap Fragmentation?
I have a .NET 3.5 Application
- A function is running a million times
- It's doing search & replace & regex operations in 1MB+ strings (different sized strings)
When I profile the application I can confirm these strings are stored in LOH but also they are reclaimed by GC later on, so at a given time only max 10 of them are in LOH (10 thread is running).
My understanding is, these big strings are located in LOH, then getting reclaimed by GC but yet somehow due to their allocation locations (and being in LOH so not getting compacted) this causes fragmentation. This is happening despite of there is no memory leak in the operation.
It doesn't cause a problem in ~100K times however when it reaches to 1M+ it gives out of memory exceptions.
I'm using ANTS Memory Profiler and this is the result that I got in the early executions:
.NET Using 70MB of 210MB total private bytes allocated in to the application
Number of Fragments: 59
Number of Large Fragments : 48 (99.6% of free memory)
Largest Frag开发者_如何转开发ment: 9MB
Free Space: 52% of total memory (37MB)
Unmanaged Memory: 66% of total private memory (160MB)
- Do you think my diagnosis are correct based on the data in hand?
- If so, how can I solve this LOH Fragmentation problem? I have to process those strings and they are big strings. Should I find a way to split them up and process like that? In that case running regex etc. in split strings will be really challenging.
Yes. That sounds correct. The LOH is getting fragmented, which leads to the runtime being unable to allocate enough contiguous space for the large strings.
You have a few options, I suppose doing which ever is easiest and effective is the one you should choose. That all depends entirely on how its written.
Break your strings into small enough chunks that they are not in the LOH. (less than 85K - Note: the logic for when an object is put on the LOH isn't that cut-and-dry.) This will allow the GC to be able to reclaim the space. This is by no means guaranteed to fix fragmentation - it can definitely still happen otherwise. If you make the strings smaller, but still end up on the LOH - you'll be putting off the problem. It depends on how much more than 1 million strings you need to handle. The other downside is - you still have to load the string in memory to split it, so it ends up on the LOH anyway. You'd have the shrink the strings before your application even loads them. Kind of a Catch-22. EDIT: Gabe in the comments makes a point that if you can load your string into a
StringBuilder
first, under the covers it makes good effort to keep things out of the LOH (until you callToString
on it).Break the processing of the string out into a separate process. Use a process instead of a thread. Use each process to process say, 10K strings, then kill the process and start another. This way, each process starts with a clean slate. The advantage of this is it doesn't change your string processing logic (incase you can't make your strings smaller for processing), and avoids the catch-22 in #1. The downside is this requires probably a bigger change to your application, and coordinating the work between the master process and the slave processing process. The trick is the master can only tell it where the large string is, it can't give it to it directly, otherwise you are back to the catch-22.
精彩评论