Comparing 2 linq applications: Unexpected result
I drafted 2 ASP.NET applications using LINQ. One connects to MS SQL Server, another to some proprietary memory structure. Both applications work with tables of 3 int fields, having 500 000 records (the memory structure is identical to SQL Server table). The controls used are regular: GridView and ObjectDataSource. In the applications I calculate the average time needed开发者_如何学C for each paging click processing.
- LINQ + MS SQL application demands 0.1 sec per page change.
- LINQ + Memory Structure demands 0.8 sec per page change.
This Is shocking result. Why the application handling data in memory works 8 times slower than the application using hard drive? Can anybody tell me why that happens?
The primary factor will probably be algorithmic efficiency. LINQ-to-Objects works with IEnumerable<T>
inputs and outputs, which are generally processed sequentially, whereas the database may have indexes that induce substantial speed-ups.
I can think of at least three reasons:
- indexes
- caching
- special optimizations (e.g. TOP N SORT)
Indexes
There are many types of queries that will run very fast if run on a database which is correctly indexed but very slow if you iterate through a list in memory. For example a lookup by ID (primary key) is almost instant in a database because the results are stored in a B-tree with very small height. To find the same element in a list in memory would require scanning the entire list.
Caching
Your assumption is that the database always hits the disk. This is not always true. The database will try to hold as much data in memory as it can, so when you ask it for data it already has the answer ready for you. In particular it will hold commonly used indexes in memory and only hit the disk when necessary. The way the data is stored on disk and in memory is also carefully optimized to reduce disk seeks and page misses.
Optimizations
Even without indexes the database still knows many tricks that could speed things up. For example if you do the following in SQL Server:
list.OrderBy(x => x.Value).Take(1)
it will be almost instant if there is an index on list, but even without the index it will use a special optimization called TOP N SORT
that runs in linear time. Check the execution plan for your query to see if this optimization is being used. Note that this optimization is not implemented for LINQ to Objects. We can see this by running this code:
Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
list.Add(new Foo { Id = random.Next() });
}
DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);
This code takes about 30 seconds to execute, and the execution time grows slower than linearly as more items are added. Replacing the query with this results in it taking less than one second:
int smallestId = list.Min(foo => foo.Id);
This is because in LINQ to objects OrderBy
is implemented using an O(n log(n))
algorithm but Min
uses a O(n)
algorithm. However when executed against SQL Server, both these queries will produce the same SQL and both are linear time - O(n)
.
So running a paging query like OrderBy(x => x.Something).Skip(50).Take(10)
is faster in a database because a lot more effort has gone into making sure that it is faster. After all, the speed of this sort of query is a major selling point for databases.
精彩评论