XSLT to remove querystring from all urls in an xml file
I need to perform a regular expression style replacement of querystrings from all the attributes in an MRSS RSS feed, stripping them down to just the url. I've tried a few things here using suggests from here: XSLT Replace function not found but to no avail
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss" type="application/rss+xml" rel="self" />
<title>How to and instructional videos from Videojug.com</title>
<description>Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
&开发者_JAVA百科lt;link>http://www.videojug.com</link>
<item>
<title>How To Calculate Median</title>
<media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4?somequerystring" type="video/mp4" bitrate="1200" height="848" duration="169" width="480">
<media:title>How To Calculate Median</media:title>
..
</media:content>
</item>
any suggestions really helpful
If you're using XSLT 2.0, you can use tokenize()
:
<xsl:template match="media:content">
<xsl:value-of select="tokenize(@url,'\?')[1]"/>
</xsl:template>
Here's another example of only changing the url
attribute of media:content
:
<xsl:template match="media:content">
<media:content url="{tokenize(@url,'\?')[1]}">
<xsl:copy-of select="@*[not(name()='url')]"/>
<xsl:apply-templates/>
</media:content>
</xsl:template>
EDIT
To handle all url
attributes in your instance, and leave everything else unchanged, use an identity transform and only override it with a template for @url
.
Here's a modified version of your sample XML. I've added two attributes to description
for testing. The attr
attribute should be left untouched and the url
attribute should be processed.
XML
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss" type="application/rss+xml" rel="self"/>
<title>How to and instructional videos from Videojug.com</title>
<!-- added some attributes for testing -->
<description attr="don't delete me!" url="http://www.test.com/foo?anotherquerystring">Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
<link>http://www.videojug.com</link>
<item>
<title>How To Calculate Median</title>
<media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4?somequerystring" type="video/mp4" bitrate="1200" height="848"
duration="169" width="480">
<media:title>How To Calculate Median</media:title>
..
</media:content>
</item>
</channel>
</rss>
XSLT
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<!--Identity Transform-->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@url">
<xsl:attribute name="url">
<xsl:value-of select="tokenize(.,'\?')[1]"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
OUTPUT (Using Saxon 9.3.0.5)
<rss xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:media="http://search.yahoo.com/mrss/"
version="2.0">
<channel>
<atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss"
type="application/rss+xml"
rel="self"/>
<title>How to and instructional videos from Videojug.com</title>
<!-- added some attributes for testing --><description attr="don't delete me!" url="http://www.test.com/foo">Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
<link>http://www.videojug.com</link>
<item>
<title>How To Calculate Median</title>
<media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4"
type="video/mp4"
bitrate="1200"
height="848"
duration="169"
width="480">
<media:title>How To Calculate Median</media:title>
..
</media:content>
</item>
</channel>
</rss>
String handling in XSLT is generally a lot easier with XSLT 2.0, but in this case it looks easy enough to achieve the requirement using the substring-before() function which is present since XSLT 1.0.
精彩评论