Database schema for forum thread votes/views, and strategy for incrementing and displaying # of views
If it matters as of now I'm using MySQL/MyISAM but I'm open to using PostgreSQL. I'm also open to using memcached.
Consider a table that's used to store forum threads:
id forum_name post_date
1 Hey! 2009-01-01 12:00:00
- What's the best practice of storing thread-related entities such as votes, views, and counters?
Should I..
Create a separate table such as
id thread_id views
1 1 532
Or keep it as a column in the initial threads table?
id foru开发者_运维问答m_name post_date views
1 Hey! 2009-01-01 12:00:00 532
Another related tidbit, what's the practical solution for displaying and incrementing page views? I read this thread and it seems like I could just cache a value for a certain time, I wasn't exactly clear on the incrementing part - perhaps something like storing the values in flat files somewhere, then periodically with cronjobs update the database forum views every hour or so?
EDIT: To clarify, voting IS like SO with one vote per thread and there can be reversals. Nevermind what I meant about counters.
Voting
I suggest defining two tables in addition to the thread table - VOTE_CODES
and THREAD_VOTES
. At a glance, it might appear over normalized but the format will allow you to change vote value without needing major DML changes.
VOTE_CODES
table
vote_code
, primary key, IE: up, downvote_cast_value
--value attributed to up/down votevote_caster_value
--optional, if you want to keep SO's style of the negative vote impacting the caster.
THREAD_VOTES
table
thread_id
user_id
vote_code
All of the columns in THREAD_VOTES
are the primary key - that will ensure that there can only be as many rows for a given user and thread as there are vote codes. Assuming only two codes, this would support the ability to reverse a vote because there could only be two records - one with either code.
Views
I would suggest storing the:
- thread id
- ip address
- user_agent --rough browser capture
- timestamp
All of the above are the primary key. Your table will populate fast, but it will give you the ability to create a computed column in a view for more accurate reporting.
flat files are obviously a bad idea, because you'd need to implement locking (the db already does that, and there's fewer bugs in that code).
relational database design is more of an art rather than a science: you can have
CREATE TABLE threads (
tid THREADID
, title THREADTITLE
, views COUNTER
, PRIMARY KEY (tid)
);
and it'll be no more and no less "correct" than
CREATE TABLE threads (
tid THREADID
, title THREADTITLE
, PRIMARY KEY (tid)
);
CREATE TABLE views (
tid THREADID
, views COUNTER
, PRIMARY KEY (tid)
, FOREIGN KEY (tid)
REFERENCES threads
);
so it's really up to you.
i'd say: go with the simplest thing first, make it more complicated if you find it's necessary (e. g. for performance reasons). IOW: put the views COUNTER
attribute in threads
. if it turns out the trafic is hurting performance (too many updates on the threads.views
attribute mean the dbms must shuffle around otherwise immutable data in the other attributes), you can always split the table up in two, and replace it with a view that joins them. voila, immutable (or rarely changing) data separated from the volatile data, interface remains the same.
of course, go with PostgreSQL. the above shown code is valid in that dbms, just add these:
CREATE DOMAIN threadid
AS INT NOT NULL;
CREATE DOMAIN threadtitle
AS TEXT NOT NULL
CHECK (LENGTH(VALUE) > 0);
CREATE DOMAIN counter
AS INT NOT NULL
CHECK (VALUE > 0);
edit to refute the comment by OMG Ponies: of course it's safe.
UPDATE threads SET
views = views + 1
WHERE tid = X
either succeeds or bails out.
edit 2 to add consideration for the voting aspect
let's say the spec is: a user may vote a thread up (+1) or down (-1), the sum of his or her votes on a given thread may not exceed |1|, and the history is irrelevant. iow a user may vote a thread up, then down to reset their vote to "no vote", then down again to "voted down", etc.
CREATE DOMAIN vote
AS INT NOT NULL
CHECK (VALUE BETWEEN -1 AND 1);
CREATE TABLE votes (
tid THREADID
, uid USERID
, vote VOTE
, PRIMARY KEY (tid, uid)
);
in MySQL, you could
INSERT INTO votes (
tid
, uid
, vote
) VALUES (
X
, Y
, Z -- +1 or -1
)
ON DUPLICATE KEY UPDATE
vote = vote + Z
alas, PostgreSQL doesn't (yet) have such functionality built in, so you'd need to use the idiomatic user-level implementation
精彩评论