开发者

Best way to store "tags" for speed in enormous table

I'm developing a big content site, with a table "contents", with more than 50 Million of records. Here's the table structure:

contain id(INT11 INDEX), 
name(varchar150 FULLTEXT), 
description (text FULLTEXT), 
date(INT11 INDEX)

I wan to add a "tags" to this contents.

I'm think 2 methods:

  1. Make a varchar(255 FULLTEXT) "tags" column in table contents. Store all tags separated by comas, and search row by row (Which I think this will be slow) using MATCH &开发者_如何学运维amp; AGAINS.

  2. Make 2 tables. First table name "tags" with columns id, tag(varchar(30 INDEX or FULLTEXT?)), "contents_tags" with id, tag_id (int11 INDEX) and content_id (int11 INDEX) and search contents by a JOINS of 3 tables (contents - contents_tags - tags) to retrieve all contents with the tag(s).

I think this is slow and memory killer because a ENORMOUS JOIN of 50M table * contents_tags * tags.

What is the best method to store tags to make it as efficient as possible? What is the fastest way to search by a text (for example "movie 3d 2011" and simple tag "video") and to locate contents.?

The size of the table (approx. 5Gb now without tags). The table is a MYISAM because I need to store name and description of the table contents in FULLTEXT to string search (users ca search now by this fields), and need the best speed to search by tags.

Any with experience in this?

Thanks!


FULLTEXT indexes are really not as fast as you may think they are.

Use a separate table to store your tags:

Table tags
----------
id integer PK
tag varchar(20)

Table tag_link
--------------
tag_id integer foreign key references tag(id)
content_id integer foreign key references content(id)
/* this table has a PK consisting of tag_id + content_id */

Table content
--------------
id integer PK
......

You SELECT all content with tag x by using:

SELECT c.* FROM tags t
INNER JOIN tag_link tl ON (t.id = tl.tag_id)
INNER JOIN content c ON (c.id = tl.content_id)
WHERE tag = 'test'
ORDER BY tl.content_id DESC /*latest content first*/
LIMIT 10;

Because of the foreign key, all fields in tag_links are individually indexed.
The `WHERE tags = 'test' selects 1 (!) record.
Equi-joins this with 10,000 taglinks.
And Equi-joins that with 1 content record each (each tag_link only ever points to 1 content).
Because of the limit 10, MySQL will stop looking as soon as it has 10 items, so it really only looks at 10 tag_links records.
The content.id is autoincrementing, so higher numbers are very fast proxy for newer articles.

In this case you never need to look for anything other than equality and you start out with 1 tag that you equi-join using integer keys (the fastest join possible).

There are no if-thens-or-buts about it, this is the fastest way.

Note that because there are at most a few 1000 tags, any search will be much faster than delving in the full contents table.

Finally
CSV fields are a very bad idea, never use then in a database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜