开发者

Extract text from XML tags using sed - shell script

Well I have already write the script which basically takes xml file as input and extract the tex开发者_StackOverflow社区t for specific XML tags and it's working. But it's not smart enough to get the multiline text and also allow special characters. It's very important that text format should be keep intact as it's defined under tags.

Below is the XML input:

<nick>Deminem</nick>
<company>XYZ Solutions</company>
<description>
  /**
   * 
   *  «Lorem» ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
   *  tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. 
   *  At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd 
   *  no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit 
   *  consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore
   *  magna aliquyam erat, sed diam voluptua.
   *
   **/
</description> 

The above script extract the text of each specific tag and assign to new valueArray. My command over sed is basic but always willing to go the extra mile.

tagsArray=( nick company description )
noOfElements=${#tagsArray[@]}

for (( i=0;i<$noOfElements;i++)); do

OUT=`grep ${tagsArray[${i}]} filename.xml | tr -d '\t' | sed -e 's/^<.*>\([^<].*\)<.*>$/\1/' `

valueArray[${i}]=${OUT}
done 


Parsing XML with regexp leads to trouble eventually, just as you have experienced. Take the time to learn enough XSL (there are many tutorials) to transform the XML properly, using for example xsltproc.

Edit:

After trying out a few command line xml utilities, I think xmlstarlet could be the tool for you. The following is untested, and assumes that filename.xml is a proper xml file (i.e. has a single root element).

tagsArray=( nick company description )
noOfElements=${#tagsArray[@]}

for (( i=0;i<$noOfElements;i++)); do
    valueArray[${i}] = `xmlstarlet sel -t -v "/root/$tagsArray[i]" filename.xml`
done


#!/bin/sh
filePath=$1 #XML file path
tagName=$2  #Tag name to fetch values
awk '!/<.*>/' RS="<"$tagName">|</"$tagName">" $filePath
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜