开发者

Redis Data Structure to Store All Clicks for All Links

I'm trying to set up a system in which ALL links posted by users and clicked by their followers are stored in redis in such a way that the following requirements are met:

  1. Able to get (for examp开发者_如何学Cle, 10%) most clicked links within a time-frame (can be either today, this week, all time, or custom).

  2. Able to query all users who posted the same link.

  3. Since we already used many keys, the ideal is that we store all this in a single Redis key.

  4. Can encode value to JSON if needed.

Here is what I came up so far:

-I use a single Redis Hash with each fields are single hour, so that in one day, that hash will contain 24 fields.

-In each field, I store a JSON encoded from an array with format:

array("timestamp1" => array($url1, $url2, ...)
    , "timestamp2" => array($url3, $url4, ...)
    , ..., ...);

-The complete structure is this hash:

[01/01/2010 00:00] => JSON(...),
[01/01/2010 01:00] => JSON(...),
....

This way, I can get all the clicks on any URL within any time-frame.

However, I can't seem to reuse this hash for getting all the users who posted the URL.

The question is: Is there any better way to do?

Updated 07/30/2011: I'm currently storing the minutes, the hours, the days, weeks, months, and years in the same hash.

So, one click is stored in many fields at once: - in the field for the minute (format YmdHi) - in the field for the hour (format YmdH) - in the field for the day (format Ymd) - in the field for the week (format YW) - in the field for the month (format Ym) - in the field for the year (format Y).

That's way, when trying to get a specific timeframe, I could only access the necessary fields withouth looping through the hours.

For example, if I need clicks from 07/26/2011 20:00 to 07/28/2011 02:00, I only need to query 7 fields: 1 field for the full day of 07/27/2011, 4 fields for the hours from 20:00 to 23:00 on 07/26, and then 2 more fields for hours from 00:00 to 01:00 on 07/28


If you drop the third requirement it becomes a lot easier. A lot of people seem to think that you should always use hashes instead of keys, but this stems from misunderstanding of a post about using hashes to improve performance in a particular limited set of circumstances.

To get the most clicked links, create a sorted set for each hour or day, with the value being the link and score being clicks set using ZINCRBY. Use ZCARD and ZREVRANGEBYSCORE to get the top 10%. It is simplest if the set holds all links in the system, though there are strategies you can use to drop less popular items from the set if necessary.

To get all users posting a link, store a set of users for each link. You could do this with JSON and a key or hash storing details for the link, but a set makes updating and querying easier.


I recommend using some bucket strategy like hashing Keys or keeping records of Link to User month wise as you don't have control on size of data structure how huge it may grow . There will be millions of user visiting a particular link . Now to get the details of all the user again it will be of no use if thrown at once . I believe what can be done is maintain counter or some metadata that act like current state and then maintain an archival storage not to be in mem. or go for a memory grid like GemFire

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜