bash script to translate XML
Hi I have dozens of XML files with
I would need this:<p begin="00:06:28;12" end="00:00:02;26">
translated into this:
<p begin="628.12" end="631.08">
I know i nee开发者_如何学JAVAd a simple awk or sed to do this, but being new; can someone help
An XSL stylesheet would be more reliable. You can run one from a shell script.
Ah ghostdog74 beat me to it. However mine also deals with the ms.
awk '
function timeToMin(str) {
time_re = "([0-9][0-9]):([0-9][0-9]):([0-9][0-9]);([0-9][0-9])"
# Grab all the times in seconds.
s_to_s = gensub(time_re, "\\3", "g", str);
m_to_s = (gensub(time_re, "\\2", "g", str)+0)*60;
h_to_s = (gensub(time_re, "\\1", "g", str)+0)*60*60;
ms = gensub(time_re, "\\4", "g", str);
# Create float.
time_str = (h_to_s+m_to_s+s_to_s)"."ms;
# Converts from num to str.
return time_str+0;
}
function addMins(aS, bS) {
# Split by decimal point
split(aS, aP, ".");
split(bS, bP, ".");
# Add the seconds and ms.
min = aP[1]+bP[1];
ms = aP[2]+bP[2];
if (ms > 59) {
ms = ms-60;
mins++;
}
# Return addition.
return (min"."ms)+0;
}
{
re = "<p begin=\"(.+)\" end=\"(.+)\">";
if ($0 ~ re) {
# Pull out the data.
strip_re = ".*"re".*";
begin_str = gensub(strip_re, "\\1", "g");
end_str = gensub(strip_re, "\\2", "g");
# Convert.
begin = timeToMin(begin_str);
end = timeToMin(end_str);
elapsed_end=addMins(begin, end);
sub(re,"<p begin=\""begin"\" end=\""elapsed_end"\">");
}
print $0;
}
' file
here's something for a start. I don't know how you want to add the decimal value, so you do it yourself
awk '/.*<p[ ]+begin=.*[ ]+end=.*/{
o=$0
gsub(/.*begin=\042|\042|>/,"")
m=split($0,s,"end=")
gsub(/[:;]/," ",s[1])
gsub(/[:;]/," ",s[2])
b=split(s[1],begin," ")
e=split(s[2],end," ")
# do date maths here
if (b>3){
tbegin=(begin[1]*3600) + (begin[2]*60) + begin[3] ##"."begin[4]
}else{
tbegin=(begin[1]*60) + begin[3] ##"."begin[4]
}
# add the decimal yourself
if(e>3) {
tend = (end[1]*3600) +(end[2]*60)+end[3]+ tbegin ##"."end[4]
}else{
tend = (end[1]*60)+end[3]+ tbegin ##"."end[4]
}
string=gensub("(.*begin=\042).*( end=\042)(.*)\042>", "\\1" tbegin "\042\\2" tend"\042>","g",o)
$0=string
}
{print}
' file
eg
$ cat file
<p begin="00:06:28;12" end="00:00:02;26">
<p begin="00:08:45;12" end="00:00:23;26">
<p begin="08:45;12" end="00:2;26">
$ ./shell.sh
<p begin="388" end="390">
<p begin="525" end="548">
<p begin="492" end="518">
If you are doing more complex tasks other than this, use a parser.
I would recommend using Perl (or another scripting language) with an XML parsing module (see here for more details on Perl and XML).
That way you can reliably parse the XML and extract/manipulate the values in a programmatic form. Note the word reliably. Your XML may make use of character encodings that a simple sed/awk wouldn't respect (unlikely in this scenario, admittedly, but it's well worth being aware of such issues).
精彩评论