开发者

Using urllib and BeautifulSoup to retrieve info from web with Python

I can get the html page using urllib, and use BeautifulSoup to parse the html page, and it looks like that I have to generate file to be read from BeautifulSoup.

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource 开发者_如何学Python= sock.read()                            
sock.close()                                        
--> write to file

Is there a way to call BeautifulSoup without generating file from urllib?


from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

No file writing needed: Just pass in the HTML string. You can also pass the object returned from urlopen directly:

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)


You could open the url, download the html, and make it parse-able in one shot with gazpacho:

from gazpacho import Soup
soup = Soup.get("https://www.example.com/")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜