开发者

best way to do reporting for a million rows

I have two database tables, subscription and transaction, where one subscription can have many transactions. The status of the subscription depends mainly on the transactions that belong to it. So if I want to calculate next process date I would look at the period field of the subscription object and then analyze the subscription's transactions to determine its status. This all works fine.

The problem that I am facing is that the table contains over 400,000 subscription objects and millions of transaction records, so it's getting kinda tricky to build a report summary of the subscriptions (like how many of each from about ten possible statuses that are calculated dynamically)

Since all of the logic to calculate the status of each subscription is in the c# code, I have to load an entire graph of subscription objects w开发者_开发百科ith all of their child transaction objects using linq-to-sql. This is taking quite a long time, maybe two minutes or so. I'm looking at caching, but won't give real-time results. I'm just wondering if there is a strategy in place that could solve this, or maybe a index on my database that may speed the linq to sql query up. Or if I just designed it horribly from the beginning.

Thanks.


Since all of the logic to calculate the status of each subscription is in the c# code, I have to load an entire graph of subscription objects with all of their child transaction

Perhaps you shouldn't load all this data in the client and do all the calculations row by row. This is what databases are actually good at. Do the calculation on server side, better still have the calculation stored in the tables and just look it up in your reports. If you have 400k subscriptions and +M transactions then the corner stone of your design is the database, not the client. You need to invest your time and design in the data model, and the client comes after that.


Do you have to redo all the calculations each time you access the subscription object? I some cases it's possible to store the last calculated result in the object (or in a new table) and start the calculations from there. You would probably need to save the id of the last transaction included with the calculation with the result. If this is feasible in your situation, you would be able to load only unprocessed transactions to memory.


The standard answer for database problems is, it depends. You need to analyse your problem to determine where the bottleneck lies. Is the main cost reading millions of rows from disk? Or is it sending millions of rows across the network? Or is it stashing millions of rows in memory?

Because there are utterly different solutions to each problem. For instance, if it transpires that your problem is due to swapping data between real memory and virtual memory then building additional indexes is not going to help (unless there is an index which can be used to pre-filter the results to reduce the number of rows returned).


Create the appropriate indexes (not enough info in your question to know what these are). A million rows is not that large a set to run a joined query on, provided you have good indexes.

Can you create a View which contains the logic to calculate the status you require? This would presumably reduce I/O, by not having to return as much data to the client.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜