Screen Scraping Twitter Page with Unicode Equal Comparison Failure Python
I'm using the following code to obtain a list of a user's followers on twitter:
import urllib
from BeautifulSoup import BeautifulSoup
#code only looks at one page of fol开发者_如何学运维lowers instead of continuing to all of a user's followers
#decided to only use a small sample
site = "http://mobile.twitter.com/NYTimesKrugman/following"
friends = set()
response = urllib.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
names = soup.findAll('a', {'href': True})
for name in names:
a = name.renderContents()
b = a.lower()
if ("http://mobile.twitter.com/" + b) == name['href']:
c = str (b)
friends.add(c)
for friend in friends:
print friend
print ("Done!")
However, I get the following results:
NYTimeskrugman
nytimesphoto
rasermus
Warning (from warnings module):
File "C:\Users\Public\Documents\Columbia Job\Python Crawler\Twitter Crawler\crawlerversion14.py", line 42
if ("http://mobile.twitter.com/" + b) == name['href']:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
amnesty_norge
zynne_
fredssenteret
oljestudentene
solistkoret
....(and so it continues)
It would appear that I was able to get most of the names of the following but I received a somewhat random error. It didn't stop the code from finishing however...I was hoping that someone could enlighten me as to what happened?
Don't know if my answer will be useful several years later, but I rewrote your code using requests instead of urllib.
I think it's better to made an other selection with the class "username" to consider only followers names !
Here's the stuff :
import requests
from bs4 import BeautifulSoup
site = "http://mobile.twitter.com/paulkrugman/followers"
friends = set()
response = requests.get(site)
soup = BeautifulSoup(response.text)
names = soup.findAll('a', {'href': True})
for name in names:
pseudo = name.find("span", {"class": "username"})
if pseudo:
pseudo = pseudo.get_text()
friends.add(pseudo)
for friend in friends:
print (friend)
print("Done !")
@paulkrugman appears in every set, so don't forget to delete it !
精彩评论