Good database for large table with simple key access

2023-01-19 13:42 问答作者：

I have a few large databases, greater than 100 million records. They consist of the following:

A unique key.
An integer value, not unique, but used for sorting the query.
A VARCHAR(200).

I have them in a mysql isam table now. My thought was, hey, I'll just set up a covering index on the data, and it should pull out reasonably fast. Queries are of the form...

select valstr,account 
    from datatable 
    where account in (12349809, 987987223,...[etc]) 
    order by orderPriority;

This seemed OK in some tests, but on our newer installation, its terribly slow. It seems faster to have no index at all, which seems odd.

开发者_如何学Python

In any case, I'm thinking, maybe a different database? We use a datawarehousing db for other parts of the system, but its not well suited for anything in text. Any free, or fairly cheap, db's are an option, as long as they have reasonably useful API access. SQL optional.

Thanks in advance.

-Kevin

CouchDB and MongoDB and Riak are all going to be good at finding the key (account) relatively quickly.

The problems you're going to have (with any solution) are tied to the "order by" and "account in" clauses.

Problem #1: account in

120M records likely means gigabytes of data. You probably have an index over a gig. The reason this is a problem is that your "in" clause can easily span the whole index. If you search for accounts "0000001" and "9999581" you probably need to load a lot of index.

So just to find the records your DB first has to load potentially a gig of memory. Then to actually load the data you have to go back to the disk again. If your "accounts" on the in clause are not "close together" then you're going back multiple times to fetch various blocks. At some point it may be quicker to just do a table scan then to load the index and the table.

Then you get to problem #2...

Problem #2: order by

If you have a lot of data coming back from the "in" clause, then order by is just another layer of slowness. With an "order by" the server can't stream you the data. Instead it has to load all of the records in memory and then sort them and then stream them.

Solutions:

Have lots of RAM. If the RAM can't fit the entire index, then the loads will be slow.
Try limiting the number of "in" items. Even 20 or 30 items in this clause can make the query really slow.
Try a Key-Value database?

I'm a big fan of K/V databases, but you have to look at point #1. If you don't have a lot of RAM and you have lots of data, then the system is going to run slowly no matter what DB you use. That RAM / DB size ratio is really important if you want good performance in these scenarios (small look-ups in big datasets).

Here's a reasonably sized example of a MySQL database using the innodb engine which takes advantage of clustered indexes on a table with approx. 125 million rows and with a query runtime of 0.021 seconds which seems fairly reasonable.

Rewriting mysql select to reduce time and writing tmp to disk

http://pastie.org/1105206

Good database for large table with simple key access

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？