Pros and cons of sorting data in DB?

2022-12-30 19:12 问答作者：

Let's assume I have a table with field of type VARCHAR. And I need to get data from that table sorted alphabetically by that field.

What is the best way (for performance): add order by field to the SQL-query or sort the data when it's already fetched?

I'm usin开发者_运维知识库g Java (with Hibernate), but I can't tell anything about DB engine. It could be any popular relational database (like MySQL or MS Sql Server or Oracle or HSQL DB or any other).

The amount of records in table can vary greatly but let's assume there are 5k records.

UPD: how well does 2nd level hibernate cache (EHCache for example) support sorted data?

If this field is indexed, then the average DB would be much more efficient in this task than Java. Also note that you normally wouldn't retrieve all those rows at once if it's for pure display, but rather retrieve a subset of it so that it can be shown by pagination. You can do this at DB level as well. Sorting the data in Java would require the entire table being hauled into Java's memory, you don't want to do that.

In Hibernate you can order the results using Criteria#addOrder() and paginate using Criteria#setFirstResult() and Criteria#setMaxResults(). E.g.

List users = session.createCriteria(User.class)
    .addOrder(Order.asc("username"))
    .setFirstResult(0) // Index of first row to be retrieved.
    .setMaxResults(10) // Amount of rows to be retrieved.
    .list();

Sort the data in the database - that's (part of) what it's there for. The database engine is probably better at sorting this data than you are.

Pro sorting in the Database:

Speed. If you have an index on the order by condition, the databasae shouldn't have to sort at all, and for maximum performance you could use a clustered index.
Ease of use. An order by in the sql query is easier to write and maintain than a Java Comparator.

Pro sorting in the application:

Customizability. Maybe you want to sort by more elaborate criteria, then a custom sort in Java will be more flexible.
Reproducibility. If you code for different databases, their Collating rules will probably differ. Maybe that's a problem, and you want one particular odering. In Java, you can write a Custom Collator to make sure the output from all databases is ordered the same way.

My solution would be create index for the sort column and write query with order by clause.

What is the best way (for performance): add sort by field to the SQL-query or sort the data when it's already fetched?

It's ORDER BY, not sort by.

It's a matter of tradeoff: sorting on client side is distributed which means less impact on the server. However, it can require more client resources.

If the field is not indexed, to return the whole sorted, recordset the server will need to do the following things:

Fetch the whole recordset
Sort it
Send it over the network to the client

, while sorting on the client side requires only points 1 and 3 (which are the least resource-intensive).

If you server needs to serve hundreds of clients simultaneously and your clients need the whole recordsets, then most probably sorting on the client side will be more efficient.

If the field is indexed, the database can return the data already sorted from that index. However, this will require additional table lookups to get the other fields.

Also, if you don't want the whole recordset but only some top fields (like in ORDER BY LIMIT or SELECT TOP … ORDER BY), the whole recorset will not need to be fetched and transmitted over the network. In this case, ordering on database side will likely be more efficient.

For only 5 thousand records, it doesn't really make much difference, but I'd sort it the database; even if there's no index on the field, it's probably at least as fast as doing it afterwards.

do you usually extract only a subset of that data ? -> a good back end design (indexing and and/or partitioning) helps you extracting that subset ordered faster; then an "order by" on the db is matter of instants.
tables always contain a few rows of data ? then an "order by" on the db is matter of instants

and even if you don't(can't) optimize your database you should (almost) always prefer to leave that kind of op.s to the b.e.

if you are willing to pull all of your data into memory and work with it in memory, here is a library that will work really well for your use case

http://casperdatasets.googlecode.com

it operates effectively like an in-memory table, and allows you to perform searching, filtering, and SORTING on data, all in memory (and in java). it performs very fast for the number of records that you are trying to work with, and you don't need to integrate with a heavy ORM framework.

继续阅读：database performance sorting

Pros and cons of sorting data in DB?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？