开发者

Matching First Alphanumeric Character skipping (The |An? )

I have a list of artists, albums and tracks that I want to sort using the first letter of their respective name. The issue arrives when I want to ignore "The ", "A ", "An " and other various non-alphanumeric characters (Talking to you "Weird Al" Yankovic and [dialog]). Django has a nice start '^(An?|The) +' but I want to ignore those and a few others of my choice.

I am doing this in Django, using a MySQL db with utf8_bin collation.

EDIT

Well my fault for not mentioning this but the database I am accessing is pretty much ready only. It's created and maintained by Amarok and I can't alter it without a whole mess of issues. That being said th开发者_开发知识库e artist table has The Chemical Brothers listed as The Chemical Brothers so I think I am stuck here. It probably will be slow but that's not so much of a concern for me as it's a personal project.


What you are asking for probably isn't what you need. You probably don't want to sort by just the first letter. If the first letter is the same then you would normally also want to look at the second letter, etc. This will cause all songs by the same artist to be grouped together when you sort by artist.

Updated answer

You said you weren't allowed to change the database. Then you can use TRIM(LEADING ... FROM ...) to strip off the uninteresting words, but note that this will be slow as the query will not be able to use an index on the column.

SELECT *
FROM song
WHERE SUBSTRING(TRIM(LEADING 'The ' FROM TRIM(LEADING 'A ' FROM title)), 1, 1) = 'B'
ORDER BY TRIM(LEADING 'The ' FROM TRIM(LEADING 'A ' FROM title))

Result:

'The Bar'   -- "The" is ignored when sorting.
'Baz A'    

Test data:

CREATE TABLE song (title NVARCHAR(100) NOT NULL);
INSERT INTO song (title) VALUES
('The Bar'),
('Baz A'),
('Foo'),
('Qux'),
('A Quux');

Original Answer

Also note that if you ORDER BY a function of a column it will be really slow when you have a lots of records as the index on that column can't be used. Instead you should store another column where you remove all uninteresting words (the, an, etc..) and order by that column. You can either insert into that column from your application when you insert the row, or else use a trigger in the database.


In PostgreSQL, I found this to be a nice way to get started with that kind of sorting:

SELECT title
FROM  albums
ORDER BY    
  CASE 
    WHEN title ~* '^The ' THEN substring(title from 5)
    WHEN title ~* '^An '  THEN substring(title from 4)
    WHEN title ~* '^A '   THEN substring(title from 3)
    ELSE title
  END asc;

I would guess that MySQL has similar beasties.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜