Adding Anchors to HTML using a list of Regex
I am trying to add anchors to my html output. The html is created using xsl 2.0 to transform xml to html. I need to be able to pass a list of regular expressions into my style sheet and have every matching instance of the list of regex be made into an anchor. I have code that works for a single regex, but when I run the list of regex through i get multiples of the same paragraph. I am by no means an expert of xsl 2.0. I'm not sure it's possible to do it this way. I can use c# too, if that's easier. if anyone thinks that would be a better solution though im not sure it is.
The Code that works for a single regex is:
<xsl:template match="text()" mode="content">
<xsl:variable name="text">
<xsl:value-of select="."></xsl:value-of>
</xsl:variable>
<!--
IndexTerms is a parameter passed into the sheet it is a list of regex expressions seperated by semi colons
-->
<xsl:for-each select="tokenize($IndexTerms, ';')">
<xsl:call-template name="IndexTerm">
<xsl:with-param name="matchedRegex">
<xsl:text>(.*)(</xsl:text>
<xsl:value-of select="."></xsl:value-of>
<xsl:text>)(.*)</xsl:text>
</xsl:with-param>
<xsl:with-param name="text">
<xsl:value-of select="$text"></xsl:value-of>
</xsl:with-param>
</xsl:call-template>
</xsl:for-each>
</xsl:template>
<xsl:template name="IndexTerm">
<xsl:param name="matchedRegex">
<xsl:text>asdf</xsl:text>
</xsl:param>
<xsl:param name="text"></xsl:param>
<xsl:analyze-string select="$text" regex="{$matchedRegex}" flags="m">
<xsl:matching-substring>
<xsl:call-template name="IndexTerm">
<xsl:with-param name="text">
<xsl:value-of select="regex-group(1)"></xsl:value-of>
</xsl:with-param>
<xsl:with-param name="matchedRegex">
<xsl:value-of select="$matchedRegex"></xsl:value-of>
</xsl:with-param>
</xsl:call-template>
<xsl:element name="a">
<xsl:attribute name="class">
<xsl:text>IndexAnchor</xsl:text>
</xsl:attribute>
<xsl:value-of select="regex-group(2)"></xsl:value-of>
</xsl:element>
<xsl:value-of select="regex-group(3)"></xsl:value-of>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."></xsl:value-of>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Sample Input:
<body>
<sec sec-type="intro">
<title>INTRODUCTION</title>
<p>Digital Television is the most advanced version of Television
technology improved in the last century. Digital TV provides
customers more choices and interactivity. New technology called
Internet Protocol-based Television (IPTV) uses digital TV technology
and transmits it over IP based networks (Driscol, 2008),
(<xref ref-type="bibr" rid="r15">Moawad, 2008</xref>). IPTV is a
technique that transmits TV and video content over a network that
uses the IP networking protocol. With increasing the number of
users, performance becomes more important in order to provide
interest in video content applications and relative services. The
requirement for new video applications on traditional broadcast
networks (cable, terrestrial transmitters, and satellite) opens a
new perspective for the developed use of开发者_开发技巧 IP networks to satisfy the
new service demands (Driscol,
2008</p>
<sec>
<title>More Introducing</title>
<p>Internet Protocol Television, IPTV, Telco TV, or broadband TV is
delivering high quality broadcast television and/or on-demand video
and audio content over a broadband network. On the other hand, IPTV
is a mechanism applied to deliver old TV channels, movies, and
video-on-demand contents over a private network. The official
definition approved by the International Telecommunication Union
focus group on IPTV (ITU-T FG IPTV) is as: “IPTV is
defined as multimedia services such as
television/video/audio/text/graphics /data delivered over IP based
networks managed to provide the required level of quality of service
and experience, security, interactivity and reliability”
(Driscol, 2008,
pp.2).</p>
</sec>
</sec>
Sample Output using the Regex Input "Digital Televisions?;Internet" would be:
<body>
<h1>INTRODUCTION</h1>
<p><a class="IndexAnchor">Digital Television</a> is the most advanced version of Television
technology improved in the last century. Digital TV provides
customers more choices and interactivity. New technology called
<a class="IndexAnchor">Internet</a> Protocol-based Television (IPTV) uses digital TV technology
and transmits it over IP based networks (Driscol, 2008),
(Moawad, 2008). IPTV is a
technique that transmits TV and video content over a network that
uses the IP networking protocol. With increasing the number of
users, performance becomes more important in order to provide
interest in video content applications and relative services. The
requirement for new video applications on traditional broadcast
networks (cable, terrestrial transmitters, and satellite) opens a
new perspective for the developed use of IP networks to satisfy the
new service demands (Driscol,
2008</p>
<h2>More Introducing</h2>
<p><a class="IndexAnchor">Internet</a> Protocol Television, IPTV, Telco TV, or broadband TV is
delivering high quality broadcast television and/or on-demand video
and audio content over a broadband network. On the other hand, IPTV
is a mechanism applied to deliver old TV channels, movies, and
video-on-demand contents over a private network. The official
definition approved by the International Telecommunication Union
focus group on IPTV (ITU-T FG IPTV) is as: “IPTV is
defined as multimedia services such as
television/video/audio/text/graphics /data delivered over IP based
networks managed to provide the required level of quality of service
and experience, security, interactivity and reliability”
(Driscol, 2008,
pp.2).</p>
Frankly instead of separating the different patterns by a semicolon I would strongly suggest to use the bar "|" which is the regular expression language character for separating alternatives. Then you can simply feed that complete parameter to analyze-string:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0"
exclude-result-prefixes="xs">
<xsl:param name="patterns" as="xs:string" select="'Digital Televisions?|Internet'"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:analyze-string select="." regex="{$patterns}">
<xsl:matching-substring>
<a class="IndexAnchor">
<xsl:value-of select="."/>
</a>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
Does that help? If you need to convert your semicolon separated list to the bar separated then do e.g. <xsl:param name="patterns" as="xs:string" select="string-join(tokenize($yourParam, ';'), '|')"/>
.
I have not used a mode and have not looked at any other transformation you might want to do but of course you should be able to use the template I have presented with a mode if needed.
精彩评论