The proper way to script periodically pulling a page from an https site
I want to create a command-line script for Cygwin/Bash that logs into a site, navigates to a specific page and compares it with the results of the last run. So far, I have it working with Lynx like so:
----snpipped, just setting variables----
echo "# Command logfile created by Lynx 2.8.5rel.5 (29 Oct 2005)
----snipped the recorded keystrokes-------
key Right Arrow
key p
key Right Arrow
key ^U" >> $tmp1 #p, right arrow initiate the page saving
#"type" the filename inside the "where to save" dialog
for i in $(seq 0 $((${#tmp2} - 1)))
do
echo "key ${tmp2:$i:1}" >> $tmp1
done
#hit enter and quit
echo "key ^J
key y
key q
key y
" >> $tmp1
lynx -accept_all_cookies -cmd_script=$tmp1 https://thewebpage.com/login
diff $tmp2 $oldComp
mv $tmp2 $oldComp
It definitely does not feel "right": the cmd_script consists of relative user actions instead of specifying the exact link names and actions. So, if anything on the site ever changes, switches places, or a new link is added - I will have to re-create the actions.
Also, I can't check for any errors so I can't abort the script if something goes wrong (login failed,开发者_Python百科 etc)
Another alternative I have been looking at is Mechanize with Ruby (as a note - I have 0 experience with Ruby).
What would be the best way to improve or rewrite this?
I think lynx is a great tool for simple web automation tasks, but of course it has its limits. If you need error checking you should use one of the mechanize modules for Perl, Python or Ruby (if you don't know any of this languages Python may be the easiest one to learn).
To make your lynx script a bit more robust you could use the search function to select links. On some pages using the link list (l) can help.
At the end I'd add some sanity checks to see if the downloaded files is really the one you want.
Could wget
be useful here ?
It is a http, https and ftd download command line utility. It is free software (GNU). It has many options such as authentication and timestamping (only download a file if it has changed since last time).
http://www.gnu.org/software/wget/
精彩评论