开发者

python scraping by getting urls dynamic way

I am new to the world of data scraping,previously used python for web and desktop app dev开发者_C百科elopment. I am just wondering,if there is any way to get the urls from a page then look into it for specific information like,phone no,address etc.

Currently I am using BeautifulSoup and built method where I am telling the urls as a parameter of the methods.

The site I am scraping large and its really tough to pass the specific url for each page.

Any suggestion to make it faster and self driven?

Thanks in advance.


You can use Scrapy. It simplifies both crawling and parsing (it uses libxml2 for parsing by default).


Use a more efficient HTML parser, like lxml. See here for performance comparisons of various Python parsers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜