extract xml data/content from a url using shell scripting
I need the xml content to be downloaded in file.xml from a url .say for example this is the url http://www.pistonheads.co.uk/xml/news091.asp?c=26 I want to extract the xml content of it as follows to a file.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="0.91">
<channel>
<title>PistonHeads (Motoring News)</title>
<link>http://www.pistonheads.com/news/</link>
<description>Motoring News</description>
<item>
<title>Bowler Nemesis Joins Spyker At CPP</title>
<description>Plans confired for Nemesis EXR road car to be built in Coventry</description>
</item>
</channel>
</rss>
I tried wget "url" -o file.xml ... and when I open file.xml... it is just returning
http://www.pistonheads.co.uk/xml/news091.asp?c=26 => `news091.asp?c=26' Resolving www.pistonheads.co.uk... done. Connecting to www.pistonheads.co.ukhttp://xx.xxx.xxx.xx connected. HTTP request sent, awaiting response... 200 OK Length: 5,016 text/xml
0K .... 100% 445.31 KB/s
开发者_高级运维13:37:13 (445.31 KB/s) - `news091.asp?c=26' saved 5016/5016
Is there any other way to solve this..?
If you want this as the output:
PistonHeads (Motoring News) http://www.pistonheads.com/news/ Motoring News
Then this will do the trick:
wget -q -O - http://www.pistonheads.co.uk/xml/news091.asp?c=26 \
  | egrep '(title>|link>|description>)' | head -3 \
  | sed -e 's/.*>\([^>]*\)<.*/\1/' | tr '\n' ' '
If however you just want the output of the link written to a file, use this:
wget -O file.xml http://www.pistonheads.co.uk/xml/news091.asp?c=2
Note the capital O for the option to write the file.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论