How to Collect Tweets More Quickly Using Twitter API in Python?

2023-01-26 03:13 问答作者：

For a research project, I am collecting tweets using Python-Twitter. However, when running our program nonstop on a single computer for a week we manage to collect about only 20 MB of data per week. I am only running this program on one machine so that we do not collect the same tweets twice.

Our program runs a loop that calls getPublicTimeline() every 60 seconds. I tried to improve this by calling getUserTimeline() on some of the users that appeared in the public timeline. However, this consistently got me banned from collecting tweets at all for about half an hour each time. Even without the ban, it seemed that there was very little speed-up by adding this code.

I know about Twitter's "whitelisting" that allows a user to submit more requests per hour. I applied for this about three weeks ago, and have not hear back since, so I am looking for alternatives that will allow our program to collect tweets开发者_运维技巧 more efficiently without going over the standard rate limit. Does anyone know of a faster way to collect public tweets from Twitter? We'd like to get about 100 MB per week.

Thanks.

How about using the streaming API? This is exactly the use-case it was created to address. With the streaming API you will not have any problems gathering megabytes of tweets. You still won't be able to access all tweets or even a statistically significant sample without being granted access by Twitter though.

I did a similar project analyzing data from tweets. If you're just going at this from a pure data collection/analysis angle, you can just scrape any of the better sites that collect these tweets for various reasons. Many sites allow you to search by hashtag, so throw in a popular enough hashtag and you've got thousands of results. I just scraped a few of these sites for popular hashtags, collected these into a large list, queried that list against the site, and scraped all of the usable information from the results. Some sites also allow you to export the data directly, making this task even easier. You'll get a lot of garbage results that you'll probably need to filter (spam, foreign language, etc), but this was the quickest way that worked for our project. Twitter will probably not grant you whitelisted status, so I definitely wouldn't count on that.

There is pretty good tutorial from ars technica on using streaming API n Python that might be helpful here.

Otherwise you could try doing it via cURL.

继续阅读：python python-twitter twitter

How to Collect Tweets More Quickly Using Twitter API in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？