开发者

The proper way to script periodically pulling a page from an https site

I want to create a command-line script for Cygwin/Bash that logs into a site, navigates to a specific page and compares it with the results of the last run. So far, I have it working with Lynx like so:

----snpipped, just setting variables----
echo "# Command logfile created by Lynx 2.8.5rel.5 (29 Oct 2005)
----snipped the recorded keystrokes-------
key Right Arrow
key p
key Right Arrow
key ^U" >> $tmp1 #p, right arrow initiate the page saving

#"type" the filename inside the "where to save" dialog
for i in $(seq 0 $((${#tmp2} - 1)))
do
    echo "key ${tmp2:$i:1}" >> $tmp1
done

#hit enter and quit
echo "key ^J
key y
key q
key y
" >> $tmp1

lynx -accept_all_cookies -cmd_script=$tmp1 https://thewebpage.com/login

diff $tmp2 $oldComp
mv $tmp2 $oldComp

It definitely does not feel "right": the cmd_script consists of relative user actions instead of specifying the exact link names and actions. So, if anything on the site ever changes, switches places, or a new link is added - I will have to re-create the actions.

Also, I can't check for any errors so I can't abort the script if something goes wrong (login failed,开发者_Python百科 etc)

Another alternative I have been looking at is Mechanize with Ruby (as a note - I have 0 experience with Ruby).

What would be the best way to improve or rewrite this?


I think lynx is a great tool for simple web automation tasks, but of course it has its limits. If you need error checking you should use one of the mechanize modules for Perl, Python or Ruby (if you don't know any of this languages Python may be the easiest one to learn).

To make your lynx script a bit more robust you could use the search function to select links. On some pages using the link list (l) can help.

At the end I'd add some sanity checks to see if the downloaded files is really the one you want.


Could wget be useful here ?

It is a http, https and ftd download command line utility. It is free software (GNU). It has many options such as authentication and timestamping (only download a file if it has changed since last time).

http://www.gnu.org/software/wget/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜