Access Violation in Multithreaded Application on Windows Server 2003
I have an application (writted using Delphi 2009) that allows a user to run a query on selected systems, then consolidates the results into a single report.
Brief Description of the app:
The user selects a query and a group of systems to run the query over. The query is run on all systems concurrently by creating a new thread to run the query in, and using TAD开发者_如何学运维OQuery to actually run the query from within the thread. When the query has finished running, TADOQuery.SaveToFile is called, passing pfXML as a parameter to save the results to an XML document. Once all queries have finished running, the application parses all of the XML documents and consolidates them into a single XML document. The user can then load the report, which calls TADOQuery.LoadFromFile to load the report and display it in a TListView.In order to ensure that the user doesn't overload the PC by submitting too many queries (thereby launching too many threads), I have implemented a queue using an array of records. Each record holds information such as query name, system, status (running, finished or pending) etc. Another reason for implementing a queue is that the user can submit multiple queries at the same time (i.e. they don't have to wait for the first one to finish before submitting another). The array of records probably isn't the most efficient way of implementing the queue, but it works. I keep the number of concurrently running threads to 100 (this can be altered by the user) and launch new threads when running ones end by synchronising the completion of the query run within the thread with a procedure that manages the queue. At no point does memory usage rise about 25-30K.
The final piece of pertinent information is that the application also contains a job scheduler, which allows the user to specify when they want queries to run. This is used by customers that want to leave the software running unattended on a server and create reports on a daily basis at a certain time each day.
The problem:
The application runs fine on Windows XP. No matter how many queries are submitted. However, after a random amount of time running on Windows Server 2003, the application stops running. Attempting to interact with the application (or close it down) causes an Access Violation to be reported. I can't for the life of me work out where it's coming from or what is causing it.My first thought was that it might have something to do with the implementation of the memory management in the operating system, but I can't see what could be causing the problem. I've compiled a version of the application using FastMM4 in full debug mode but it isn't reporting any issues with freeing memory that shouldn't have been freed or anything along those lines, and there are no memory leaks when running under normal circumstances, so although I'm still sure that there's an issue with memory management, I can't see what that would be.
I notice that after the access violations occur, there are a lot of report files in the temp folder of my application (meaning that some of the queries have run and returned results, but not all of the threads have finished and the reports haven't yet been consolidated). The job scheduler also reports that it has been running fine and submitting jobs into the queue, which haven't run because the maximum number of threads (100 by default) are running (I've been testing by submitting jobs via the scheduler every 20 minutes and leaving the application overnight).
This leads me to believe that there is probably an issue with the processing of the queue (the array of records). If a thread finishes and synchronises with the procedure that manages the queue, but there is a problem with the queue, then the next thread won't be started, and as not all of the queries have finished, the reports in the temp folder won't be consolidated. This seems to be the point at which the application is stuck.
I therefore have two problems:
1. What is causing the access violation? Is it something to do with the queue, or is it probably something else? 2. Why does the application work fine on Windows XP but falls over on Windows Server 2003?UPDATE
I've managed to batter the application to such an extent during testing that I can now also produce the error on Windows XP, so it doesn't look like it's restricted to Windows Server 2003. It just seems to appear on Windows Server 2003 a lot faster than on XP.If I run a group of queries over a group of systems, wait until all reports have been created, and then repeat the process, eventually, queries just stop being submitted and double clicking anywhere in the application (and trying to close down) results in an Access Violation (always write, although the memory address varies, and it isn't always write of address 0).
I've traced the callstack using MadExcept and it doesn't show anything unusual - only the line of code where the double click event is triggered.
Something is stopping the queries being submitted and also (I'd guess) causing the access violations, but I can't see what it might be.
More of a troubleshooting tip than an actual answer... For #2, is the hardware identical? If not, this may really be an issue of single-core vs multi-core (or processor). Since it's a multi-threaded app, it's not unreasonable to expect it to behave differently with multiple processors/cores. So make sure that you're not clouding the issue with too many variables (hardware vs OS).
I've finally managed to track down the issue, and I suppose you could call it a schoolboy error!
Within each thread, I create a TADOQuery component that is used to run the query. I set the owner as the main form, but then before the thread terminates, I clear the memory:
adoqry := TADOQuery.create(frmMain);
try
<code>
finally
freeAndNil(adoqry);
end;
It seems that the problem is setting the owner of the TADOQuery component to be the main screen of the application, meaning that the main screen will also be trying to free the memory when the application closes, but given that the application stays up while the user runs thousands of queries, these references seem to stack up and eventually the application gives up and starts throwing access violations.
I've changed the owner of the TADOQuery component to nil and now the application works fine, even after several thousand query runs (it was falling over after a hundred or so initially).
精彩评论