How to make functions for my pymongo/twitter script?
I'm working on creating scripts using python, mongodb and the pymongo module to fetch certain aspects of the Twitter API and store them in a mongo database. I've written some scripts to do different things: access the search API, access the user_timeline, and more. However, I have been just getting to know all of the tools that I'm working with and it's time for me to go back and make it more efficient. Thus, right now I'm working on adding functions and classes to my scripts. Here is one of my scripts without functions or classes:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'
# Connect to the database
connection = Connection()
db = connection.test
collection = db.twitterapi_mentions # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
These scripts have been working well for me, but I have to run the same script for multiple twitter handles. For instance I'll copy the same script and change the following two lines:
SCREEN_NAME = '@cocacola'
collection = db.cocacola_mentions
Thus I'm getting mentions for both @twitterapi and @cocacola. I've thought a lot about how I can make this into a function. The biggest problem that I've run into is finding a way to change the name of the collection. For instance, consider this script:
#!/usr/local/bin/python
import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection
def getMentions(screen_name):
# Connect to the database
connection = Connection()
db = connection.test
collection = db.screen_name # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')
# Fetch the information from the API
results = []
for i in range(2):
i+=1
response = t.search(q=screen_name, result_type='recent', rpp=100, page=i) ['results']
results.extend(response)
# Create a document in the database for each item taken from the API
for tweet in results:
id_str = tweet['id_str']
twitter_id = tweet['from_user']
tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
date = created_at.date().strftime("%m/%d/%y")
time = created_at.time().strftime("%H:%M:%S")
text = tweet['text']
identifier = {'id' : id_str}
entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
collection.update(identifier, entries, upsert = True)
getMentions("@twitterapi")
getMentions("@cocacola")
If I use the above script then all of the data is stored in the collection "screen_name" but I want it to be stored in the screen name that is passed through. Ideally, I want @twitterapi mentions to be in a "twitterapi_mentions" collection and I want @cocacola mentions to be in a "cocacola_mentions" collection. I believe that using the 开发者_如何学PythonCollection class of pymongo might be the answer and I've read the documentation but can't seem to get it to work. If you have other suggestions of how I should make this script more efficient they would be incredibly appreciated. Otherwise, please excuse any mistakes I've made, as I said, I'm new to this.
Use getattr to retrieve the attribute by string name:
collection = getattr(db, screen_name)
I'd go with:
collection = db[screen_name]
I think it's more straightforward.
精彩评论