Writing a script for fetching the specific part of a web page in a loop for offline use
I have a specific use. I am preparing for GRE. Everytime a new word comes, I look it up at www.mnemonicdictionary.com, for its meanings and mnemonics. I want to write a script in python preferably ( or if someone could provide me a pointer to an already existing thing as I dont know python much but I am learning now) which takes a list of words from a text file, and looks it up at this site, and just fetch relevant portion (meaning and mnemonics) and store it another text file for offline use. Is it possible to do so ?? I tried to look up the source of these pages also. But along with html tags, they also have some ajax functions. Could someone provide me a complete way how to go about this ??
Example: for word impecunious:
the related html source is like this
<ul class='wordnet'><li><p>(adj.) not having enough money to pay for necessities</p><u>synonyms</u> : <a href='http://www.mnemonicdictionary.com/word/hard up' onclick="ajaxSearch('hard up','click'); return false;">hard up</a> , <a href='http://www.mnemonicdictionary.com/word/in straitened circumstances' onclick="ajaxSearch('in straitened circumstances','click'); return false;">in straitened circumstances</a> , <a href='http://www.mnemonicdictionary.com/word/penniless' onclick="ajaxSearch('penniless','click'); return false;">penniless</a> , <a href='http://www.mnemonicdictionary.com/word/penurious' onclick="ajaxSearch('penurious','click'); return false;">penurious</a> , <a href='http://www.mnemonicdictionary.com/word/pinched' onclick="ajaxSearch('pinched','click'); return false;">pinched</a><p></p></li></ul>
but the web page renders like this:
•(adj.) not having enough money to pay for necessities synonyms : hard up , in straitened circumstances , penniless , penurious , pinched开发者_开发百科
If you have Bash (version 4+) and wget
, an example
#!/bin/bash
template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
url=$(printf "$template" "$word")
data=$(wget -O- -q "$url")
data=${data#* }
echo "$word: ${data%%<*}"
done < file
Sample output
$> more file
synergy
tranquil
jester
$> bash dict.sh
synergy: the working together of two things (muscles or drugs for example) to produce an effect greater than the sum of their individual effects
tranquil: (of a body of water) free from disturbance by heavy waves
jester: a professional clown employed to entertain a king or nobleman in the Middle Ages
Update: Include mneumonic
template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
url=$(printf "$template" "$word")
data=$(wget -O- -q "$url")
data=${data#* }
m=${data#*class=\'mnemonic\'}
m=${m%%</p>*}
m="${m##* }"
echo "$word: ${data%%<*}, mneumonic: $m"
done < file
Use curl and sed from a Bash shell (either Linux, Mac, or Windows with Cygwin).
If I get a second I will write a quick script ... gotta give the baby a bath now though.
精彩评论