开发者

Where to start with scripting project - searching a log-in only web site from term/cmd

I'm attempting to write a script for myself that will log in to a specific website and use the website's search function. The script will then write the开发者_JAVA技巧 list of search results to standard output, where I will then select one of the results and do various actions on the page. I'm very lost as where to start with this. I've already attempted cURL and python's various web libraries, but I haven't been able to come up with anything that works.


Assuming that website doesn't provide a search API, you need to do automated scraping, in which case curl etc. are way too low-level and error-prone. Here are are some widely-used recommendations:

For automation, link-following, formfilling etc., I strongly recommend twill API, which is an automation layer which sits on top of mechanize. twill has a bunch of useful extension modules. As just one example, for filling in authentication forms, twill.formfill multi_sub is great.

For manual scraping, BeautifulSoup, but twill probably already does what you need (scrapes all the links, forms etc.).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜