开发者

How to transform a xml node with commas to multiple nodes?

I have a 1 time transformation to do to a large XML file.

I have :

[stuff]
<items>string1,string2,string3,string4</items>
[other stuff]

I want to replace it with :

<itemList>
    <item>string1</item>
    <item>string2</item>
    <item>string3</item>
    <item>string4</item>
</itemList>

I'm hesitating between using a RegEx or XSL. I've been trying to go the regex way :

Search

^.*<items>(.*)</items>

Replace with

<itemList>\1</itemList>

I'm stuck at the "find comma and replace them by something". I'm not even 开发者_如何转开发sure it's doable...

How could I finish this RegEx? Should I go XSL instead?


I would use XSLT 2.0.

XML Input:

<doc>
  <stuff>sdfsadfsa</stuff>
  <items>string,string,string,string</items>
  <otherstuff>sdfasdfsaf</otherstuff>
</doc>

XSLT 2.0:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="items">
    <itemList>
      <xsl:for-each select="tokenize(.,',')">
        <item><xsl:value-of select="."/></item>
      </xsl:for-each>
    </itemList>
  </xsl:template>

</xsl:stylesheet>

XML Output:

<doc>
   <stuff>sdfsadfsa</stuff>
   <itemList>
      <item>string</item>
      <item>string</item>
      <item>string</item>
      <item>string</item>
   </itemList>
   <otherstuff>sdfasdfsaf</otherstuff>
</doc>

If you don't have an XSLT 2.0 processor, I would suggest Saxon.


Because regexes are pretty bad at doing precisely this with a single pass, and I'm assuming the stressed "1 time" means a one time effort - and not that it must happen in one fell swoop (or only one expression), I would recommend two stages (and I'm using Perl syntax)

first stage (change the outer tags to the new tag container name):

s!<(/?)items>!<$1itemList>!

second stage (parse the listed items if they're in the containers):

s!,([^<,]+)(?=,|</itemList>)|(?<=<itemList>)([^<,]+)(?=,|</itemList>)!\n    <item>$1$2</item>!

These expressions will produce what you need, but may not produce EXACTLY the whitespace you showed in your example output. This is also assuming that the tags are as simple as your question shows... if you get much more complex (lots of different names, etc...), you should probably look into XSLT

If you want to have it formatted the same way as your example output, use this one last expression on a third pass, which will add in an extra carriage return in the right place:

s!(</item>)(</itemList>)!$1\n$2!
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜