database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests
which structure returns faster result and/or less taxing on the host server, flat file or database (mysql)?
Assume many users (100 users) are simultaneously query the file/db. Searches involve pattern matching against a static file/db. File has 50,000 unique lines (same data type). There could 开发者_StackOverflowbe many matches. There is no writing to the file/db, just read.
Is it possible to have a duplicate the file/db and write a logic switch to use the backup file/db if the main file is in use?
Which language is best for the type of structure? Perl for flat and PHP for db?
Addition info:
If I want to find all the cities have the pattern "cis" in their names. Which is better/faster, using regex or string functions?
Please recommend a strategy
TIA
I am a huge fan of simple solutions, and thus prefer -- for simple tasks -- flat file storage. A relational DB with its indexing capabilities won't help you much with arbitrary regex patterns at all, and the filesystem's caching ensures that this rather small file is in memory anyway. I would go the flat file + perl route.
Edit: (taking your new information into account)
If it's really just about finding a substring in one known attribute, then using a fulltext index (which a DB provides) will help you a bit (depending on the type of index applied) and might provide an easy and reasonably fast solution that fits your requirements. Of course, you could implement an index yourself on the file system, e.g. using a variation of a Suffix Tree, which is hard to be beaten speed-wise.
Still, I would go the flat file route (and if it fits your purpose, have a look at awk
), because if you had started implementing it, you'd be finished already ;) Further I suspect that the amount of users you talk about won't make the system feel the difference (your CPU will be bored most of the time anyway).
If you are uncertain, just try it! Implement that regex+perl solution, it takes a few minutes if you know perl, loop 100 times and measure with time
. If it is sufficiently fast, use it, if not, consider another solution. You have to keep in mind that your 50,000 unique lines are really a low number in terms of modern computing. (compare with this: Optimizing Mysql Table Indexing for Substring Queries )
HTH,
alexander
Depending on how your queries and your data look like a full text search engine like Lucene or Sphinx could be a good idea.
精彩评论