What's a good way to select a random set of twitterers?

2022-12-19 21:33 问答作者：

Considering the set of Twitter users "nodes" and the relation u follows v as the "edges", we have a graph from which I would like to select a subset of the users at random. I could be wrong, but from reading the API docs I think it's impossible to get a collection of users except by gettin开发者_如何学运维g the followers or friends of an already-known user.

So, starting from myself and exploring the Twitter graph from there, what's a good way to select a random sample of (say 100) users?

I would use the numerical user id. Generate a bunch of random numbers, and fetch users based on that. If you hit a nonexistent id, simply skip that.

The Twitter API wiki, for users/show:

id. The ID or screen name of a user.

Twitter's streaming API has an endpoint called "Sample" which Returns a small random sample of all public statuses (cf. https://dev.twitter.com/docs/api/1.1/get/statuses/sample)

Authors twitter Ids are returned with the tweets, so this would get you random active twitter users.

You can use GET statuses/sample to get a continuos stream of tweets from twitter being posted while your code is executing. You can then extract the user (tweeter) from the tweet information received

Here is the python code to do so using the Python twitter api

import twitter

f=open("account","r") #this file should contain "consumer_key consumer_secret access_token_key access_token_secret"
acc=f.read().split()
f.close()

api=twitter.Api(consumer_key=acc[0], consumer_secret=acc[1], access_token_key=acc[2], access_token_secret=acc[3])


lis = api.GetStreamSample()
cnt = 0
userIDs = []

for tweet in lis:

    # stop after getting 100 tweets. You can adjust this to any number
    if cnt == 100:
        break;

    cnt += 1
    userIDs.append(tweet['user']['id'])


userIDs = list(set(userIDs))    # To remove any duplicated user IDs
print userIDs

Assuming the six degrees of separation is true, you could do a Breadth first search upto 6 levels and select 100 random users from that list. Or you could say, I will stop looking for more users when I get say, a million unique users and sample 100 from that.

Since storing a list of million users and trying to sample might be prohibitive, there is a technique called Reservoir Sampling which you can use, that allows you to sample during the traversal itself.

Just query the public timeline, and use the set of users returned:

http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-statuses-public_timeline

It won't be random, since it's just the last 20 tweets sent by anyone, but it will most likely never be the same set of users twice.

Since it only gives you 20 at a time, and the results are cached on their servers for 60 seconds, you'll have to do 5 different requests with a 60 second pause in between them.

Of course, it's also possible that some users will be tweeting frequently in a certain time period, so you might get less than 100 users total in that time, so you could just loop until you've gotten 100, if you need to.

Unless you have the entire twitter user graph (or a random sample of it), you won't be able to take a random sample. Otherwise, any sample you take will be biased by its relationship to you.

You may use this repo, [Random Twitter Handles Generator], to generate random twitter handles(usernames) for a specific country.

Random handles are generated based on:

country name
specified number of random coordinate points in that country
radius of the given latitude/longitude(coordinate point) in km (tweets will be within that radius)
specified number of tweets to get per a coordinate point
language of the tweets

继续阅读：graph random twitter

What's a good way to select a random set of twitterers?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？