Setting timeouts to parse webpages using python lxml
I am using python lxml library to parse html pages:
import lxml.html
# this might run indefinitely
page = lxml.html.parse('http://stackoverf开发者_StackOverflowlow.com/')
Is there any way to set timeout for parsing?
It looks to be using urllib.urlopen
as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.
import socket
timeout = 10
socket.setdefaulttimeout(timeout)
Of course this is a quick-and-dirty solution.
精彩评论