Ignore Unicode Error
When I run a loop over a bunch of URLs to find all links (in certain Divs) on those pages I get back this error:
Traceback (most recent call last):
File "file开发者_如何转开发_location", line 38, in <module>
out.writerow(tag['href'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 0: ordinal not in range(128)
The code I have written related to this error is:
out = csv.writer(open("file_location", "ab"), delimiter=";")
for tag in soup_3.findAll('a', href=True):
out.writerow(tag['href'])
Is there a way to solve this, possibly using an if statement to ignore any URLs that have Unicode errors?
Thanks in advance for your help.
You can wrap the writerow method call in a try
and catch the exception to ignore it:
for tag in soup_3.findAll('a', href=True):
try:
out.writerow(tag['href'])
except UnicodeEncodeError:
pass
but you almost certainly want to pick an encoding other than ASCII for your CSV file (utf-8 unless you have a very good reason to use something else), and open it with codecs.open()
instead of the built-in open
.
精彩评论