开发者

How do I design a MongoDB schema for a Twitter article aggregator

I'm new to MongoDB and as an exercise I'm building an application that extracts links from tweets. The idea is to get the most tweeted articles for a subject. I having a hard time to design the schema for this application.

  • The application harvest开发者_开发百科 tweets and saves them
  • The tweets are parsed for links
  • The links are saved with additional information (title, excerpt, etc.)
  • A tweet can contain more then one link
  • A link can have many tweets

How do I:

  • Save these collections, Embedded Document?
  • Get the top ten links sorted by number of tweets they have?
  • Get the most tweeted link for a specific date?
  • Get the tweets for a link?
  • Get the ten latests tweets?

I would love to get some input on this.


two general tips: 1.)don't be afraid to duplicate. It is often a good idea to store the same data differently formatted in different collections.

2.) if you want to sort and sum up stuff, it helps to keep count fields everywhere. mongodb's atomic update method together with upsert commands make it easy to count up and to add fields to existing documents.

The following is most certainly flawed because it's typed from the top of my head. But better bad examples than no examples I thought ;)

colletion tweets:

{
  tweetid: 123,
  timeTweeted: 123123234,  //exact time in milliseconds
  dayInMillis: 123412343,  //the day of the tweet kl 00:00:00
  text: 'a tweet with a http://lin.k and an http://u.rl',
  links: [
     'http://lin.k',
     'http://u.rl' 
  ],
  linkCount: 2
}

collection links: 

{
   url: 'http://lin.k'
   totalCount: 17,
   daycounts: {
      1232345543354: 5, //key: the day of the tweet kl 00:00:00
      1234123423442: 2,
      1234354534535: 10
   }
}

add new tweet:

db.x.tweets.insert({...}) //simply insert new document with all fields

//for each found link:
var upsert = true;
var toFind =  { url: '...'};
var updateObj = {'$inc': {'totalCount': 1, 'daycounts.12342342': 1 } }; //12342342 is the day of the tweet
db.x.links.update(toFind, updateObj, upsert);

Get the top ten links sorted by number of tweets they have?

db.x.links.find().sort({'totalCount:-1'}).limit(10);

Get the most tweeted link for a specific date?

db.x.links.find({'$gt':{'daycount.123413453':0}}).sort({'daycount.123413453':-1}).limit(1); //123413453 is the day you're after

Get the tweets for a link?

db.x.tweets.find({'links': 'http://lin.k'});

Get the ten latests tweets?

db.x.tweets.find().sort({'timeTweeted': -1}, -1).limit(10);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜