How do I strip accents from characters in XSL?

2023-02-19 15:29 问答作者：

I keep looking, but can't find an XSL function that is the equivalent of "normalize-space", for characters. That is, my content has accented UNICODE characters, which is great, but from that content, I'm creating a filename, where I don't want those accents.

So, is there something that I'm overlooking, or not googling properly, to easily process characters?

In the XML data:

<filename>gri_gonéwiththèw00mitc</filename>

In XSLT stylesheet:

<xsl:var开发者_如何学Pythoniable name="file">
    <xsl:value-of select="filename"/>
</xsl:variable>

<xsl:value-of select="$file"/>

results in "gri_gonéwiththèw00mitc"

where

<xsl:value-of select='replace( normalize-unicode( "$file", "NFKD" ), "[^\\p{ASCII}]", "" )'/>

results in nothing.

What I'm aiming for is gri_gonewiththew00mitc (no accents)

Am I using the syntax wrong?

In XSLT/XPath 1.0 if you want to replace those accented characters with the unaccented counterpart, you could use translate() function.

But, that assumes your "accented UNICODE characters" aren't composed unicode characters. If that were the case, you would need to use XPath 2.0 normalize-unicode() function.

And, if the real goal is to have a valid URI, you should use encode-for-uri()

Update: Examples

translate('gri_gonéwiththèw00mitc','áàâäéèêëíìîïóòôöúùûü','aaaaeeeeiiiioooouuuu')

Result: gri_gonewiththew00mitc

encode-for-uri('gri_gonéwiththèw00mitc')

Result: gri_gon%C3%A9withth%C3%A8w00mitc

Correct expression provide suggest by @biziclop:

replace(normalize-unicode('gri_gonéwiththèw00mitc','NFKD'),'\P{ASCII}','')

Result: gri_gonewiththew00mitc

Note: In XPath 2.0, the correct character class negation is with a capital \P.

So, contrary to my comment, you could try this:

replace( normalize-unicode( "öt hűtőházból kértünk színhúst", "NFKD" ), "[^\\p{ASCII}]", "" )

Although be warned that any characters which can't be decomposed and aren't basic ASCII (Norwegian ø or Icelandic Þ for example) will be completely deleted from the string, but that's probably okay with your requirements.

The previously suggested ways contain unknownthe character class named 'ASCII'. In my experience, XPath 2.0 recognises the class 'BasicLatin', which should serve the same purpose as 'ASCII'.

replace(normalize-unicode('Lliç d'Am Oükl Úkřeč', 'NFKD'), '\P{IsBasicLatin}', '')

The top voted answer does not work anymore XPath2.0, as mentioned by Yuri. The 'IsBasicLatin' is an appropriate substitution for ASCII

The following code works:

replace(normalize-unicode('çgri_gonéwiththèmitç','NFKD'),'\P{IsBasicLatin}','')

继续阅读：character-encoding unicode xml xslt

How do I strip accents from characters in XSL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？