Crawling youtube user info

2023-03-10 04:44 问答作者：

I'm trying to crawl Youtube to retrieve information about a group of users (approx. 200 people). I'm interested in looking for relationships between the users:

contacts
subscribers
subsc开发者_StackOverflowriptions
what videos they commented on
etc

I've managed to get contact information with the following source:

import gdata.youtube
import gdata.youtube.service
from gdata.service import RequestError
from pub_author import KEY, NAME_REGEX
def get_details(name):
    yt_service = gdata.youtube.service.YouTubeService()
    yt_service.developer_key = KEY
    contact_feed = yt_service.GetYouTubeContactFeed(username=name)
    contacts = [ e.title.text for e in contact_feed.entry ]
    return contacts

I can't seem the get the other bits of information I need. The reference guide says that I can grab the XML feed from http://gdata.youtube.com/feeds/api/users/username/subscriptions?v=2 (for some arbitrary user). However, if I try to get other users' subscriptions, I get the a 403 error with the following message:

User must be logged in to access these subscriptions.

If I use the gdata API:

sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name)
sub = [ e.title.text for e in contact_feed.entry ]

then I get the same error.

How can I get these subscriptions without logging in? It should be possible, as you can access this information without logging in to the Youtube web-site.

Also, there seems to be no feed for the subscribers of particular user. Is this information available through the API?

EDIT

So, it appears this can't be done through the API. I had to do this the quick and dirty way:

for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done

Then use this script to get out the usernames from the downloaded HTML files:

"""Extract usernames from a Youtube profile using regex"""
import re
def main():
    import sys
    lines = open(sys.argv[1]).read().split('\n')
    #
    # The html files has two <a href="..."> tags for each user: once for an 
    # image thumbnail, and once for a text link.
    # 
    users = set()
    for l in lines:
        match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l)
        if match:
            users.add(match.group('name'))
    users = list(users)
    users.sort()
    print users
if __name__ == '__main__':
    main()

In order to access a user's subscriptions feed without the user being logged in, the user must check the "Subscribe to a channel" checkbox under his Account Sharing settings.

Currently, there is no direct way to get a channel's subscribers through the gdata API. In fact, there has been an outstanding feature request for it that has remained open for over 3 years! See Retrieving a list of a user's subscribers?.

继续阅读：gdata python youtube

Crawling youtube user info

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？