开发者

sorting 1 billion rows by one varchar column in MYSQL quickly

I have 1 billion rows stored in MYSQL, I need to output them alphabetically by the a varchar column, what's the most eff开发者_运维知识库icient way of go about it. using other linux utilites like sort awk are allowed.


MySQL can deal with a billion rows. Efficiency depends on 3 main factors: Buffers, Indexes and Joins.

Some suggestions:

Try to fit data set you’re working with in memory

Processing in memory is so much faster and you have whole bunch of problems solved just doing so. Use multiple servers to host portions of data set. Store portion of data you’re going to work with in temporary table etc.

Prefer full table scans to index accesses

For large data sets full table scans are often faster than range scans and other types of index lookups. Even if you look at 1% or rows or less full table scan may be faster.

Avoid joins to large tables

Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller tables is OK but you might want to preload them to memory before join so there is no random IO needed to populate the caches.

Be aware of MySQL limitations which requires you to be extra careful working with large data sets. In MySQL, a query runs as a single thread (with exeption of MySQL Cluster) and MySQL issues IO requests one by one for query execution, which means if single query execution time is your concern many hard drives and large number of CPUs will not help.

Sometimes it is good idea to manually split query into several, run in parallel and aggregate result sets.

You did not give much info on your setup or your dataset, but this should give you a couple of clues on what to watch out for. In my opinion having the (properly tuned) database sort this for you would be faster than doing it programmatically unless you have very specific needs not mentioned in your post.


Have you just tried indexing the column and dumping them out? I'd try that first to see if the performance was inadequate before going exotic.


It depends on how you define efficient. CPU/Memory/IO/Time/Coding Effort. What is important in this case?

"select * from big_table order by the_varchar_column" That is probably the most efficient use of developer resources. Adding an index might make it run a lot faster.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜