开发者

Parsing a website

I want to make a program that takes as user input a website address. The pr开发者_如何学编程ogram then goes to that website, downloads it, and then parses the information inside. It outputs a new html file using the information from the website.

Specifically, what this program will do is take certain links from the website, and put the links in the output html file, and it will discard everything else.

Right now I just want to make it for websites that don't require a login, but later on I want to make it work for sites where you have to login, so it will have to be able to deal with cookies.

I'll also want to later on have the program be able to explore certain links and download information from those other sites.

What are the best programming languages or tools to do this?


Beautiful Soup (Python) comes highly recommended, though I have no experience with it personally.


Python.

It's fairly easy to write a simple crawler using python's standard libs, but you'll also be able to find some existing python crawler libraries available on the web.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜