Multithreading a large number of web requests in c#
I have an program where I need to create some large number of folders to an external sharepoint site (external meaning I can't use the sharepoint object model). Web requests work well for this, but simply doing them one at a time (send request, wait for response, repeat) is rather slow. I decided to multithread the requests, to try and speed it up. The program has sped up considerably, but after some amount of time (between 1-2 minutes or so开发者_JAVA技巧), concurrency exceptions start getting thrown.
Code is below, is this the best way to go about this?
Semaphore Lock = new Semaphore(10, 10);
List<string> folderPathList = new List<string>();
//folderPathList populated
foreach (string folderPath in folderPathList)
{
Lock.WaitOne();
new Thread(delegate()
{
WebRequest request = WebRequest.Create(folderPath);
request.Credentials = DefaultCredentials;
request.Method = "MKCOL";
WebResponse response = request.GetResponse();
response.Close();
Lock.Release();
}).Start();
}
for(int i = 1;i <= 10;i++)
{
Lock.WaitOne();
}
The exception is something along the lines of
Unhandled Exception: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one useage of each socket address is normally permitted 192.0.0.1:81
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddre ss socketAddress) at System.Net.Sockets.Socket.InternalConnect(EndPoint remoteEP) at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)You might create too many connections, thus using up all the local ports you can use.
There's a timeout period for when a port can be reused after you close it.
WebRequest
hides all the low level socket handling for you, but I am guessing it eventually runs out of ports, or tries to (re)bind to a socket already in a TIME_WAIT state.
You should make sure you read the response stream, even if you don't care about the response. This should help not producing too many lingering connections.
WebResponse response = request.GetResponse();
new StreamReader(response.GetResponseStream()).ReadToEnd();
I'll paste some relevant info from here:
When a connection is closed, on the side that is closing the connection the 5 tuple { Protocol, Local IP, Local Port, Remote IP, Remote Port} goes into a TIME_WAIT state for 240 seconds by default. In this case, the protocol is fixed - TCP the local IP, remote IP and remote PORT are also typically fixed. So the variable is the local port. What happens is that when you don't bind, a port in the range 1024-5000 is used. So roughly you have 4000 ports. If you use all of them in 4 minutes - meaning roughly you make 16 web service calls per second for 4 minutes you will exhaust all the ports. That is the cause of this exception.
OK now how can this be fixed?
One of the ways is to increase the dynamic port range. The max by default is 5000. You can set this up to 65534.
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort
is the key to use.The second thing you can do is once the connection does get into an TIME_WAIT state you can reduce the time it is in that state, Default is 4 minutes, but you can set this to 30 seconds
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\TCPTimedWaitDelay
is the key to use. Set this to 30 seconds
You're not closing the webrequest which might cause the connection to be open unecessarily long. This sounds like a perfect job for Parallel.Net's Parallel.Foreach, just be sure to indicate how many threads you want it running on
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 10;
Parallel.ForEach(folderPathList, parallelOptions, folderPathList =>
{
using(WebRequest request = WebRequest.Create(folderPath))
{
request.Credentials = DefaultCredentials;
request.Method = "MKCOL";
GetResponse request = WebRequest.Create(folderPath);
request.Credentials = DefaultCredentials;
request.Method = "MKCOL";
using (WebResponse response = request.GetResponse());
}
});
Another thing to keep in mind is maxConnections, be sure to set it in your app.config:
<configuration>
<system.net>
<connectionManagement>
<add address = "*" maxconnection = "100" />
</connectionManagement>
</system.net>
</configuration>
Of couse in a real-world scenario you would have to add try-catch to and retrying connections that might time out leading to more complicated code
For this kind of IO intensive tasks, asynchronous programming model is very useful. However, it is a little hard to use in C#.C# also has language level support for async now, you can try the CTP release.
try this
folderPathList.ToList().ForEach(p =>
{
ThreadPool.QueueUserWorkItem((o) =>
{
WebRequest request = WebRequest.Create(p);
request.Credentials = DefaultCredentials;
request.Method = "MKCOL";
WebResponse response = request.GetResponse();
response.Close();
});
EDIT - different webrequest approach
folderPathList.ToList().ForEach(p =>
{
ThreadPool.QueueUserWorkItem((o) =>
{
using (WebClient client = new WebClient())
{
client.Credentials = DefaultCredentials;
client.UploadString(p, "MKCOL", "");
}
});
});
精彩评论