开发者

Ignore Unicode Error

When I run a loop over a bunch of URLs to find all links (in certain Divs) on those pages I get back this error:

Traceback (most recent call last):
File "file开发者_如何转开发_location", line 38, in <module>
out.writerow(tag['href'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 0: ordinal not in range(128)

The code I have written related to this error is:

out  = csv.writer(open("file_location", "ab"), delimiter=";")
for tag in soup_3.findAll('a', href=True):   
    out.writerow(tag['href'])

Is there a way to solve this, possibly using an if statement to ignore any URLs that have Unicode errors?

Thanks in advance for your help.


You can wrap the writerow method call in a try and catch the exception to ignore it:

for tag in soup_3.findAll('a', href=True):
    try:
        out.writerow(tag['href'])
    except UnicodeEncodeError:
        pass

but you almost certainly want to pick an encoding other than ASCII for your CSV file (utf-8 unless you have a very good reason to use something else), and open it with codecs.open() instead of the built-in open.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜