开发者

Mechanism to strip specific tags from an XHTML document (but keep their contents)?

I would like a brief and easy way to strip tags from an XHTML document, and believe there has to be something curt enough among all the options like: XSLT, XPath, XQuery, custom C# programming using the .NET XML namespace. I'm open to others.

For example, I want to strip all <b> tags from an XHTML document but keep their inner content and child tags (i.e. not simply skip the bold tag and its children).

I need to开发者_开发技巧 maintain the structure of the original document minus the stripped tags.

Thoughts:

  • I've seen XSLT's ability to match elements for selection; however I want to match everything by default with a couple of exceptions, and I'm unsure it's conducive to this. This is what I'm looking at right now.

  • XQuery I haven't started to look into. (Update for XQuery: Took a brief look at this technology and it's comparable enough to SQL in function that I fail to see how it can maintain the nested node structure of the original document - I think this is not a contender).

  • A custom C#/.NET XML namespace program might be viable as I already have an idea for it, but my immediate assumption is it's likely more involved contrasted with the reasons for which these other XML-specific matching languages were created.

  • ... another kind of enabling technology I haven't yet considered...


I need to maintain the structure of the original document minus the stripped tags

Have you thought of XSLT? This is the language specifically designed for transforming XML and generally tree structures.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="b">
  <xsl:apply-templates/>
 </xsl:template>
</xsl:stylesheet>

when applied on any XHTML document, as the one below:

<html>
 <head/>
 <body>
  <p> Hello, <b>World</b>!</p>
 </body>
</html>

produces the wanted, correct result, in this case:

<html>
   <head/>
   <body>
      <p> Hello, World!</p>
   </body>
</html>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜