Holding a large collection in memory, for querying
Would it be ok, for example, to hold an IEnumerable in memory, in my ASP.Net app, indefinately?
For example:
Every morning, my asp.net mvc app needs to load data from CSV files. This data is loaded from a few CSV files, then, using LINQ joins etc.. it's merged into a single, de-normalized collection, of around 500,000 "Things"
The apps sole purpose is to query this data. Methods like:
- GetThingsByName
- GetThingsByPrice
etc...
My idea was to just have a static IEnumerable that the Controller could call upon..?
It would be running on a dedicated server...
Basically, I'm trying to avoid using a database (of any kind, NoSQL or otherwise), as I don't think it's ne开发者_Python百科eded, since the data is fairly volatile.
The querying would be done using LINQ.
I agree with Pavel. It is also highly dependent on the types of queries you're going to be running. If you're doing a lot of aggregations, you'll probably want an in-memory database like SQLite or maybe even a full-fledged database like MySQL or SQL Server. If you're just doing lookups by PK, you might get away with storing the data in a HashMap or similar.
IEnumerable and LINQ-to-Objects aren't magical. They just provide a common interface for querying and aggregation. If your actual implementing class of IEnumerable is a List, guess what? When you say:
var query = from item in items // items is a List<T>
where item.Name.StartsWith("Foo")
&& item.CreationDate > new DateTime(2010,1,1)
select item;
var allFoos = query.ToList();
Then LINQ-to-Objects is going to iterate through all 500,000 objects in memory checking whether the where clause is satisfied. There will be no indices or other query optimizations happening. You'll be doing a linear search through memory!
What kind of structure are you holding the records in? Because just seeing words "IEnumerable", "500 000" and "querying" in the same sentence is giving me shivers (take a look how LINQ really works and you'll understand).
Have you considered other options like using in-memory database ? SQLite for example
If you have the memory, losing the data is not an issue and syncing it with the "master" source of the data is easy then I don't see a problem with this approach. It's hard to say any more without knowing the structure and original source of the data.
It all depends how much memory you have to play with and how large these data structures are. Are we talking about Booleans and integers or larger complex types that take up many bytes of memory?
How many times would these records be accessed and how much time would it take if accessed from a database?
A few more statistics would be nice.
It is feasible. I work on a similar system just keeping around 2.x million items in (large) memory. Access is by primary key (only). Ther are some other elements (related items) but I get the pk's for those rare evaludations from the database.
The problem on my end is that those elements are changing all the time. This means taking in a number (in the hundreds of thousands sometimes) of changes PER SECOND.
Iti s a rare case, and in this care case keeping thigns in memory is pretty much "it" (as in: the only way). Server restarted once per week (GC is useless here - if an item woudl be retired, it would be.... in the last GC "slice" anyway) to give thigns a chacne to start fresh. Memroy used? Large (64 bit needed), but it is doable. Only way here. Changes are also logged and then processed into the database for later querying.
If you CAN, stay away from an approach like this.
The apps sole purpose is to query this data. Methods like:
•GetThingsByName •GetThingsByPrice
And here you are off. GetThingsByPrice will NOT work without Index, and indexing in memory is HARD (I do not do it - get by symbol .... which is a "name".... is the ONLY search method I support in memory). And most likely NOT worth the effort. If you need querying, push it to a real database. James Kovacs pretty much nails it in his answer. A simple IEnumeable will NOT work, you will have to implement a full LINY quer4y provider including.... advanced search evaluation (which order etc.) which is NASTY. Even look by name is bad... I use a special API here (no LINY) where you turn in the name and it makes a reference lookup in a hashtable.
精彩评论