开发者

bash script to translate XML

Hi I have dozens of XML files with

I would need this:

<p begin="00:06:28;12" end="00:00:02;26">

translated into this:

<p begin="628.12" end="631.08">

I know i nee开发者_如何学JAVAd a simple awk or sed to do this, but being new; can someone help


An XSL stylesheet would be more reliable. You can run one from a shell script.


Ah ghostdog74 beat me to it. However mine also deals with the ms.

awk '
    function timeToMin(str) {
        time_re = "([0-9][0-9]):([0-9][0-9]):([0-9][0-9]);([0-9][0-9])"

        # Grab all the times in seconds. 
        s_to_s =  gensub(time_re, "\\3", "g", str);
        m_to_s = (gensub(time_re, "\\2", "g", str)+0)*60;
        h_to_s = (gensub(time_re, "\\1", "g", str)+0)*60*60;
        ms     =  gensub(time_re, "\\4", "g", str);

        # Create float.
        time_str = (h_to_s+m_to_s+s_to_s)"."ms;

        # Converts from num to str.
        return time_str+0; 
    }
    function addMins(aS, bS) {
        # Split by decimal point
        split(aS, aP, ".");
        split(bS, bP, ".");

        # Add the seconds and ms.
        min = aP[1]+bP[1];
        ms  = aP[2]+bP[2];
        if (ms > 59) {
            ms = ms-60;
            mins++;
        }

        # Return addition.
        return (min"."ms)+0;
    }
    {
        re = "<p begin=\"(.+)\" end=\"(.+)\">";
        if ($0 ~ re) {
            # Pull out the data.
            strip_re = ".*"re".*";
            begin_str = gensub(strip_re, "\\1", "g");
            end_str   = gensub(strip_re, "\\2", "g");

            # Convert.
            begin = timeToMin(begin_str);
            end   = timeToMin(end_str);

            elapsed_end=addMins(begin, end);

            sub(re,"<p begin=\""begin"\" end=\""elapsed_end"\">");
        }

        print $0;
    }
' file


here's something for a start. I don't know how you want to add the decimal value, so you do it yourself

awk '/.*<p[ ]+begin=.*[ ]+end=.*/{
    o=$0
    gsub(/.*begin=\042|\042|>/,"")
    m=split($0,s,"end=")
    gsub(/[:;]/," ",s[1])
    gsub(/[:;]/," ",s[2])
    b=split(s[1],begin," ")
    e=split(s[2],end," ")
    # do date maths here
    if (b>3){
        tbegin=(begin[1]*3600) + (begin[2]*60) + begin[3]  ##"."begin[4]
    }else{
        tbegin=(begin[1]*60) + begin[3]  ##"."begin[4]
    }
    # add the decimal yourself
    if(e>3) {
        tend = (end[1]*3600) +(end[2]*60)+end[3]+ tbegin ##"."end[4]
    }else{
        tend = (end[1]*60)+end[3]+ tbegin ##"."end[4]
    }
    string=gensub("(.*begin=\042).*( end=\042)(.*)\042>", "\\1" tbegin "\042\\2" tend"\042>","g",o)
    $0=string
}
{print}
' file

eg

$ cat file
<p begin="00:06:28;12" end="00:00:02;26">
<p begin="00:08:45;12" end="00:00:23;26">
<p begin="08:45;12" end="00:2;26">

$ ./shell.sh
<p begin="388" end="390">
<p begin="525" end="548">
<p begin="492" end="518">

If you are doing more complex tasks other than this, use a parser.


I would recommend using Perl (or another scripting language) with an XML parsing module (see here for more details on Perl and XML).

That way you can reliably parse the XML and extract/manipulate the values in a programmatic form. Note the word reliably. Your XML may make use of character encodings that a simple sed/awk wouldn't respect (unlikely in this scenario, admittedly, but it's well worth being aware of such issues).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜