开发者

Alphabetically ordering records with "The", "A", "An" etc at the beginning of varchar field

I'm looking for both MySQL and PostgreSQL solutions for this kind of problem.

Say I have a number of records with a title field. The titles are book or movie titles, like "The Cat in the Hat" and "Robin Hood". But while the titles must be displayed in their original form, they ought to be sorted in the way that libraries sort them, which is by moving any article, like "The" or "An" to the end of the title.

So "The Cat in the Hat" is sorted as if it were "Cat in the Hat, The".

开发者_StackOverflow社区

What's the best way either to design the schema or write the query so that these records are sorted by title in the same way that libraries sort the title? (I also wish I knew the technical term for this type of ordering by title.) Also, what performance considerations should I be aware of and what indexes should I create?


Why don't you just add a "title_prefix" field to the table and move all these "the" and "a" strings there? When you're ordering you would use the "title" field, and when you are presenting the title you could do the concatenation in any way you wish.


Create a custom function that (sortableTitle, perhaps?) that will modify strings starting with your unwanted words. Finish your query statement with order by sortableTitle(title). This will incur an extra CPU cost, though you'll have to benchmark to know how much so.

You could create an extra column (sortTitle) that is populated by a trigger. This will take up some space, but then your server will be able to sort rows by an index.

Excepting the above, you cannot (without modifying the database server code) directly create an index that is in the order you want. As far as I can tell, that applies to both MySQL and PostgreSQL.


iTunes achieves this by having a second field in which the title is stored in the desired sorting format and sorting on this instead of title. It does sound like the cheap way out, but when you consider the performance implications of doing string manipulations on every title every time you do a select statement that orders by title, against doing string manipulations each time you insert or update the title, it does make sense.


Select * from TitleTable 
Order by 
Case when substring(title,0,4) = 'The ' then substring(title, 4, len(title)-4)
when substring(title,0,3) = 'An ' then substring(title, 3, len(title)-3)
when substring(title,0,2) = 'A ' then substring(title, 2, len(title)-2)
else title 
end


I would suggest you split the title field in two fields: mainTitle and pre.

When a title is added, check if it starts with "A", "The" or other prefixes and split it (perhaps with a trigger) into the two fields. Your table would look like this:

| pre |   mainTitle    |
|-----|----------------|
| The | Cat in the Hat |
| A   | Space Odyssey  |
|     | Eyes Wide Shut |

So, you can have an index on the mainTitle field and use it for sorting.

When you want to show the full title, concat the two fields, in either of the two forms.


  • If you choose this way, you'll have to modify accordingly the code for when a user gives a title to search in your table. The given title will have to be split the same way before searching the mainTitle field.

  • You'll have to be very, very careful with the code (trigger or other) that does the spliting so some special cases are caught correctly. You wouldn't want to have the A = B or the A B C: learn the alphabet books shown and sorted as = B, A and B C: learn the alphabet, A

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜