String tokenisation algorithm won't tokenise
Morning all, I am writing a bash script to extract the values of certain XML tags from all files in a given directory. I have decided to do this by tokenising each line and returning th4e relavent token. The problem is that it isn't tokenising correctly and I can't quite work out why. Here is the smallest example that I could make that reconstructs the issue
#!/bin/bash
for file in `ls $MY_DIRECTORY`
do
for line in `cat $MY_DIRECTORY/$file`
do
LOCALIFS=$IFS
IFS=<>\"
TOKENS=( $line )
IFS=$LOCALIFS
echo "Token 0: ${TOKENS[0]}"
echo "Token 1: ${TOKENS[1]}"
echo "Token 2: ${TOKENS[2]}"
echo "Token 3: ${TOKENS[3]}"
done
done
I'm guessing the issue is to do with my fiddl开发者_运维百科ing with IFS inside a loop which itself uses IFS (i.e. the cat operation), but this has never been a problem before.
Any ideas?Thanks, Rik
Use a better tool to parse xml, ideally it should be a parser, but if your requirement is simple and you know how your xml is structured, simple string manipulation might suffice. For example, xml file and you want to get value of tag3
$ cat file
blah
<tag1>value1 </tag1>
<tag2>value2 </tag2>
<tag3>value3
</tag3>
blah
$ awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' file
value3
so to iterate over your directory
for file in *.xml
do
value="$(awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' "$file" )"
echo "$value"
done
精彩评论