开发者

XSLT, grab just a portion of a string within a tag

alright, i have an xslt stylesheet that does most of what i need now, it looks like so:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="//Product/Description">
    <title>
      <xsl:apply-templates/>
    </title>
  </xsl:template>
  <xsl:template match="//Product/Picture">
    <link>
      <xsl:apply-templates/>
    </link>
  </xsl:template>
  <xsl:template match="//Product/Caption">
    <description>
      <xsl:apply-templates/>
    </description>
  </xsl:template>
  <xsl:template match="Picture">
    <xsl:param name="text"/>
    <xsl:choose>
      <xsl:when test="contains($text, '&lt;')">
        <xsl:value-of select="substring-before($text, '&lt;')"/>
        <xsl:call-template name="strip-tags">
          <xsl:with-param name="text" select="substring-after($text, 'src=')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="Caption">
    <xsl:param name="text"/>
    <xsl:choose>
      <xsl:when test="contains($text, '&lt;')">
        <xsl:value-of select="substring-before($text, '&lt;')"/>
        <xsl:call-template name="strip-tags">
          <xsl:with-param name="text" select="substring-after($text,'&gt;')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
    <xsl:apply-templates/>
  </xsl:template>
</xsl:stylesheet>

this is probably a huge kludge because i am just grabbing the text from the 'raw' output of my xml editor because it does what i need. it is putting the correct tags in the right places. however, now the 'strip-tag' doesnt seem to work, and i tried to make another version of the 'strip-tag' that would strip everything following 'src=' and preceding '>' but obviously 'strip-tag' would be the opposite of what i am trying to do. is there something that does the opposite of 'strip-tag'? then i could just replace the word 'strip-tag' with 'strip-all-except' or whatever it would be called

EDIT:

here is the input xml file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE StoreExport SYSTEM "http://store.yahoo.com/doc/dtd/StoreExport.dtd">
<StoreExport>
  <Settings>
    <Published timestamp="1297187196"/>
    <Locale code="C" name="English" encoding="iso-8859-1"/>
    <StoreName>Cl33333</StoreName>
    <Currency>USD</Currency>
    <ShipMethods>
      <ShipMethod></ShipMethod>

    </ShipMethods>
    <PayMethods>

    </PayMethods>
  </Settings>
  <Products>  

<Product Id="agfasu">
  <Code>3616a</Code>
  <Description>Ageless Fashion Suit</Description>
  <Url>http://www.cl333333333d.com/agfasu.html</Url>
  <Thumb>&lt;img border=0 width=50 height=70 src=http://ep.y3333333333327706119506618_2144_317652924&gt;</Thumb>
  <Picture>&lt;img border=0 width=600 height=845 src=http://ep.yim3333333st-27706119506618_2144_317019111&gt;</Picture>

  <Orderable>YES</Orderable>
  <Taxable>YES</Taxable>
  <Pricing>
    <BasePrice>178.00</BasePrice>

  </Pricing>
  <Path>333333333333333om/wochsu.html">Womens Church Suits</ProductRef>
    <ProductRef Id="2454" Url="http://www.cl33333333454.html">Aussie Austine Spring/Summer 2011</ProductRef>

  </Path>
  <Availability>Usually ships the next business day.</Availability>
  <Caption>&lt;head&gt; &lt;meta content="en-us" http-equiv="Content-Language"&gt; &lt;style type="text/css"&gt; .style3 {  font-family: arial, helvetica;  font-size: medium;  font-weight: bold; } .style4 {  font-size: small; } &lt;/style&gt; &lt;/head&gt;  &lt;p&gt;&lt;strong&gt;Wholesale Women&amp;#39;s</Caption>

  <OptionLists>
    <OptionList name="Size">
      <OptionValue>8</OptionValue>
    </OptionList>
    <OptionList name="Colors">
      <OptionValue>Red</OptionValue>
    </OptionList>

    <OptionList na开发者_开发知识库me="Accessories">
      <OptionValue>Suit</OptionValue>
    </OptionList>

  </OptionLists>
</Product>  

the output i would like:

<item>
<title>
<![CDATA['DescriptionTag]]>  
</title>
<description>
<![CDATA[CaptionTagStrippedofEscapedCharacters]]>
</description>
<link>'UrlTag'</link>
<g:condition>new</g:condition>
<g:price>'BasePriceTag'</g:price>
<g:product_type>Clothing, Accessories</g:product_type>
<g:image_link>'PictureTagFrom 'src=' to '>' </g:image_link>
<g:payment_accepted>Visa</g:payment_accepted>
<g:payment_accepted>Mastercard</g:payment_accepted>
<g:payment_accepted>Discover</g:payment_accepted>
</item>  

some of the tags dont need to be populated from the source, but are always the same, such as 'payment accepted', 'condition', and 'product type'


One shouldn't use an XML vocabulary nor an XML consumer that expects parseable data as unparsed text node

If you do it, then you must face consequences and do proper parsing instead of some error prone RegExp or string handling.

A very basic XSLT parser for encode properly wellformed XHTML is found at https://bug98168.bugzilla.mozilla.org/attachment.cgi?id=434081

So, you could parse your unparsed data, and then apply a second phase transformation with node-set() extension function.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜