List problem with extracting data from Twitter XML page

2023-03-21 19:26 问答作者：

With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a pr开发者_运维知识库oject. The problem though is that when I grab the usernames and input them into a list something strange happens. Instead of having each username as a separate element within a list I have each username being its own list.

So I instead get 20 or so lists. Here is an example of what my code produces list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]

So you see every single username is its own list. Instead of having one list with three values I have three lists with one value each in them. This is an absolute nightmare to iterate through. How can I make it so I have one list with three elements?

Code is here:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    data = soup.findAll("uri")
    for uri in soup.findAll('uri'):
        data = []
        uri = str(uri.extract())
        data.append(uri[5:-6] 
        print data

You're making a new list, called data, for each URI. If you move the data = [] line out of the for uri in soup.findAll('uri'): loop, you should end up with one list instead of a list of lists.

In addition, you've got some other problems. There is a syntax error on your next to last line: you're missing a close-parenthesis at the end of the line. You've got duplicate lines. Try removing the first data = [] line, as well as the data = soup.findAll('url') line, as you're just doing findAll again for the for loop. In addition, you shouldn't put raw_input in the function signature, because that means it gets call when you define the function, not when you call the function.

Try this:

def get_names():
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += raw_input("What term do you want to search for?")
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(doc)
    doc.close()
    data = [str(uri.extract())[5:-6] for uri in soup.findall('uri')]
    return data
names = get_names()
print(names)

Edit: You also don't need to ''.join(doc), read() returns a single string, not a sequence; data can be assembled with a string comprehension.

The problem is you're sort of all over the place in your assignments to data; I'd suggest changing that code to:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    for uri in soup.findAll('uri'):
        uri = str(uri.extract())
        data.append(uri[5:-6])
    print data
    return data

(untested since I don't know what BeautifulStoneSoup is refering to)

HTH

Pacific

继续阅读：python twitter

List problem with extracting data from Twitter XML page

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？