List problem with extracting data from Twitter XML page
With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a pr开发者_运维知识库oject. The problem though is that when I grab the usernames and input them into a list something strange happens. Instead of having each username as a separate element within a list I have each username being its own list.
So I instead get 20 or so lists. Here is an example of what my code produces list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]
So you see every single username is its own list. Instead of having one list with three values I have three lists with one value each in them. This is an absolute nightmare to iterate through. How can I make it so I have one list with three elements?
Code is here:
def get_names(search_term = raw_input("What term do you want to search for?")):
search_page = "http://search.twitter.com/search.atom?q="
search_page += search_term
data = []
doc = urllib.urlopen(search_page).read()
soup = BeautifulStoneSoup(''.join(doc))
data = soup.findAll("uri")
for uri in soup.findAll('uri'):
data = []
uri = str(uri.extract())
data.append(uri[5:-6]
print data
You're making a new list, called data, for each URI. If you move the data = []
line out of the for uri in soup.findAll('uri'):
loop, you should end up with one list instead of a list of lists.
In addition, you've got some other problems.
There is a syntax error on your next to last line: you're missing a close-parenthesis at the end of the line.
You've got duplicate lines. Try removing the first data = []
line, as well as the data = soup.findAll('url')
line, as you're just doing findAll again for the for loop.
In addition, you shouldn't put raw_input
in the function signature, because that means it gets call when you define the function, not when you call the function.
Try this:
def get_names():
search_page = "http://search.twitter.com/search.atom?q="
search_page += raw_input("What term do you want to search for?")
doc = urllib.urlopen(search_page).read()
soup = BeautifulStoneSoup(doc)
doc.close()
data = [str(uri.extract())[5:-6] for uri in soup.findall('uri')]
return data
names = get_names()
print(names)
Edit: You also don't need to ''.join(doc)
, read()
returns a single string, not a sequence; data
can be assembled with a string comprehension.
The problem is you're sort of all over the place in your assignments to data; I'd suggest changing that code to:
def get_names(search_term = raw_input("What term do you want to search for?")):
search_page = "http://search.twitter.com/search.atom?q="
search_page += search_term
data = []
doc = urllib.urlopen(search_page).read()
soup = BeautifulStoneSoup(''.join(doc))
for uri in soup.findAll('uri'):
uri = str(uri.extract())
data.append(uri[5:-6])
print data
return data
(untested since I don't know what BeautifulStoneSoup is refering to)
HTH
Pacific
精彩评论