Extract text from XML tags using sed - shell script
Well I have already write the script which basically takes xml file as input and extract the tex开发者_StackOverflow社区t for specific XML tags and it's working. But it's not smart enough to get the multiline text and also allow special characters. It's very important that text format should be keep intact as it's defined under tags.
Below is the XML input:
<nick>Deminem</nick>
<company>XYZ Solutions</company>
<description>
  /**
   * 
   *  «Lorem» ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
   *  tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. 
   *  At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd 
   *  no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit 
   *  consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
   *  magna aliquyam erat, sed diam voluptua.
   *
   **/
</description> 
The above script extract the text of each specific tag and assign to new valueArray. My command over sed is basic but always willing to go the extra mile.
tagsArray=( nick company description )
noOfElements=${#tagsArray[@]}
for (( i=0;i<$noOfElements;i++)); do
OUT=`grep ${tagsArray[${i}]} filename.xml | tr -d '\t' | sed -e 's/^<.*>\([^<].*\)<.*>$/\1/' `
valueArray[${i}]=${OUT}
done 
Parsing XML with regexp leads to trouble eventually, just as you have experienced. Take the time to learn enough XSL (there are many tutorials) to transform the XML properly, using for example xsltproc.
Edit:
After trying out a few command line xml utilities, I think xmlstarlet could be the tool for you. The following is untested, and assumes that filename.xml is a proper xml file (i.e. has a single root element).
tagsArray=( nick company description )
noOfElements=${#tagsArray[@]}
for (( i=0;i<$noOfElements;i++)); do
    valueArray[${i}] = `xmlstarlet sel -t -v "/root/$tagsArray[i]" filename.xml`
done
#!/bin/sh
filePath=$1 #XML file path
tagName=$2  #Tag name to fetch values
awk '!/<.*>/' RS="<"$tagName">|</"$tagName">" $filePath
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论