How do you best offload a database insert, so a web response is returned quicker?
Setup
I have web service that takes its inputs through a REST interface. The REST call does not return any meaningful data, so whatever is passed in to the web service is just recorded in the database and that is it. It is an analytics service which my company is using internally to do some special processing on web requests that are received on their web page. So it is very important the response take as little time to return as possible.
I ha开发者_运维百科ve pretty much optimized the code down as much as possible, to make the response as fast as possible. However, the time the database stays open still keeps the connection open for longer than I want before a response is sent back to the web client.
The code looks basically like this, by the way it is ASP.NET MVC, using Entity Framework, running on IIS 7, if that matters.
public ActionResult Add(/*..bunch of parameters..*/) {
using (var db = new Entities()) {
var log = new Log {
// populate Log from parameters
}
db.AddToLogs(log);
db.SaveChanges();
}
return File(pixelImage, "image/gif");
}
Question
Is there a way to off load the database insert in to another process, so the response to the client is returned almost instantly?
I was thinking about wrapping everything in the using
block in another thread, to make the database insert asynchronous, but didn't know if that was the best way to free up the response back to the client.
What would you recommend if you were trying to accomplish this goal?
If the request has to be reliable then you need to write it into the database. Eg. if your return means 'I have paid the merchant' then you can't return before you actually commit in the database. If the processing is long then there are database based asynchronous patterns, using a table as a queue or using built-in queuing like Asynchronous procedure execution. But these apply when heavy and lengthy processing is needed, not for a simple log insert.
When you want just to insert a log record (visitor/url tracking stuff) then the simplest solution is to use CLR's thread pools and just queue the work, something like:
...
var log = new Log {// populate Log from parameters}
ThreadPool.QueueUserWorkItem(stateInfo=>{
var queueLog = stateInfo as Log;
using (var db = new Entities())
{
db.AddToLogs(queuedLog);
db.SaveChanges();
}
}, log);
...
This is quick and easy and it frees the ASP handler thread to return the response as soon as possible. But it has some drawbacks:
- If the incomming rate of requests exceeds the thread pool processing rate then the in memory queue will grow until it will trigger an app pool 'recycle', thus loosing all items 'in progress' (as well as warm caches and other goodies).
- The order of requests is not preserved (may or may not be important)
- It consumes a CLR pool thread on doing nothing but waiting for a response from the DB
The last concern can be addressed by using a true asynchronous database call, via SqlCommand.BeginExecuteXXX and setting the AsynchronousProcessing on the connection to true. Unfortunately AFAIK EF doesn't yet have true asynchronous execution, so you would have to resort to the SqlClient layer (SqlConnection, SqlCommand). But this solution would not address the first concern, when the rate of page hits is so high that this logging (= writes on every page hit) becomes a critical bottle neck.
If the first concern is real then and no threading and/or producer/consumer wizardry can aleviate it. If you trully have an incomming rate vs. write rate scalability concern ('pending' queue grows in memory) you have to either make the writes faster in the DB layer (faster IO, special log flush IO) and/or you have to aggregate the writes. Instead of logging every requests, just increment in memory counters and write them periodically as aggregates.
I've been working on multi-tier solutions mostly for the last year or so that require this sort of functionality, and that's exactly how I've been doing it.
I have a singleton that takes care of running tasks in the background based on an ITask
interface. Then I just register a new ITask
with my singleton and pass control from my main thread back to the client.
Create a separate thread that monitors a global, in memory queue. Have your request put it's information on the queue and return, the thread then takes the item off the queue and posts it to the DB.
Under heavy load, if the thread lags the requests, your queue will grow.
Also, if you lose the machine, you will lose any unprocessed queue entries.
Whether these limitations are acceptable to you, you'd need to decide that.
A more formal mechanism is using some actual middleware messaging system (JMS in Java land, dunno the equivalent in .NET, but there's certainly something).
It depends: When you return to the client do you need to be 100% sure that the data is stored in the database?
Take this scenario:
- Request comes in
- A thread is started to save to the database
- Response is sent to the client
- Server crashes
- Data was not saved to the database
You also need to check how many milliseconds you save by starting a new thread instead of saving to the database.
The added complexity and maintainence cost is probably too high compared with the savings in response time. And the savings in response time are probably so low that they will not be noticed.
Before I spent a lot of time on the optimization I'd be sure of where the time is going. Connections like these have significant latency overhead (check this out). Just for grins, make your service a NOP and see how it performs.
It seems to me that the 'async-ness' needs to be on the client - it should fire off the call to your service and move on, especially since it doesn't care about the result?
I also suspect that if the NOP performance is good-to-tolerable on your LAN it will be a different story in the wild.
精彩评论