database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests

2022-12-31 12:04 问答作者：

which structure returns faster result and/or less taxing on the host server, flat file or database (mysql)?

Assume many users (100 users) are simultaneously query the file/db. Searches involve pattern matching against a static file/db. File has 50,000 unique lines (same data type). There could 开发者_StackOverflowbe many matches. There is no writing to the file/db, just read.

Is it possible to have a duplicate the file/db and write a logic switch to use the backup file/db if the main file is in use?

Which language is best for the type of structure? Perl for flat and PHP for db?

Addition info:

If I want to find all the cities have the pattern "cis" in their names. Which is better/faster, using regex or string functions?

Please recommend a strategy

TIA

I am a huge fan of simple solutions, and thus prefer -- for simple tasks -- flat file storage. A relational DB with its indexing capabilities won't help you much with arbitrary regex patterns at all, and the filesystem's caching ensures that this rather small file is in memory anyway. I would go the flat file + perl route.

Edit: (taking your new information into account) If it's really just about finding a substring in one known attribute, then using a fulltext index (which a DB provides) will help you a bit (depending on the type of index applied) and might provide an easy and reasonably fast solution that fits your requirements. Of course, you could implement an index yourself on the file system, e.g. using a variation of a Suffix Tree, which is hard to be beaten speed-wise.

Still, I would go the flat file route (and if it fits your purpose, have a look at awk), because if you had started implementing it, you'd be finished already ;) Further I suspect that the amount of users you talk about won't make the system feel the difference (your CPU will be bored most of the time anyway).

If you are uncertain, just try it! Implement that regex+perl solution, it takes a few minutes if you know perl, loop 100 times and measure with time. If it is sufficiently fast, use it, if not, consider another solution. You have to keep in mind that your 50,000 unique lines are really a low number in terms of modern computing. (compare with this: Optimizing Mysql Table Indexing for Substring Queries )

HTH,
alexander

Depending on how your queries and your data look like a full text search engine like Lucene or Sphinx could be a good idea.

继续阅读：database flat-file performance regex

database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？