开发者

Setting timeouts to parse webpages using python lxml

I am using python lxml library to parse html pages:

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverf开发者_StackOverflowlow.com/')

Is there any way to set timeout for parsing?


It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜