开发者

Which has better performance?

I am thinking of database schema for post and its comments, in context of a social networking application and im wandering which开发者_开发百科 of these two would give better performance:

I am storing comments of a post in "Comments" Table and posts in the "Posts" Table. Now my schema for the comments table looks like this:

postId commentId postedBy Date CommentBody

Since in order to retrieve the comments of a post I would be required to search all posts whose postId matches postId of this specific post and even my postId could not become primary key since the postId would be non unique within the column(since several comments for a single post), therefore I was thinking if I could merge postId and commentId into one single commentId (this becomes primary key) using which postId could also be retrieved. This is how I am thinking:

CommentId would be generated as postId*100+i (where i is the ith comment on the post)

thus in order to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (of course this limits the maximum no of comments to 100). But will this lead to any performance gains?


Here's what you do. Load up a database with representative data at (for example) twice the size you ever expect it to get.

Then run your queries and test them against both versions of the schema.

Then, and this is the good bit, retest this every X weeks with new up-to-date data to ensure the situation hasn't changed.

That's what being a DBA is all about. Unless your data will never change, database optimisation is not a set-and-forget operation. And the only way to be sure is to test under representative conditions.

Everything else is guesswork. Educated guesswork, don't get me wrong, but I'd rather have a deterministic answer in preference to anyone's guess, especially since the former will adapt to changes.

My favorite optimisation mantra is "Measure, don't guess!"


I'd recommend:

  • Use two-table structure with composite key in comments for best uniquness in index.

  • 100 comments per article is a bad limition that may hit you in the back.

  • Dont use different tables for comments regarding video/pictures etc.

  • If huge amounts of comments, add an comment-archive table and move old comments there. Most requested comments (newest) will have a smaller and more efficient table.

  • Do save blobs (pictures and videos) on different partition and not in db. Db will be smaller and less fragmented at file level.

regards, /t


If you gonna get big volume you should make a table Post and a table Comments in order to have smaller table :). And don't forget to use index and partitions on them.


Use a composite key. Or, if you're using some framework that only allows single-column keys, a secondary index on postId


If CommendId is not unique, you can create a composite PRIMARY KEY on (postId, CommentID):

CREATE TABLE Comment
        (
        postId INT NOT NULL,
        commentId INT NOT NULL,
        …,
        PRIMARY KEY (postId, commentId)
        )

If your table is MyISAM, you can mark commentId as AUTO_INCREMENT, which will assign it with a per-post UNIQUE incrementing value.

If it is unique, you can create a PRIMARY KEY on CommentId and create a secondary index on (PostId, CommentId):

CREATE TABLE Comment
        (
        commentId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        postId INT NOT NULL,
        …,
        KEY (postId, commentId)
        )


CommentId would be generated as postId*100+i (where i is the ith comment on the post)

thus inorder to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (ofcourse this limits the maximum no of comments to 100). But will this lead to any performance gains ??

This will likely give much worse performance than a query based on a postId foreign key column, but the only way to be sure is to try both techniques (as suggested by paxdiablo) and measure the performance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜