PCRE seems to be removing particular characters

2023-02-08 12:32 问答作者：

I have a piece of text (part French part English) that has the European style Canadian Dollar symbols ($C) in it multiple times. When I attempt to use a regex using either traditional or unicode characters, the symbols have been removed from the text and cannot be matched with. I used a lazy regex so that if it doesn't find the expected symbols it still works.

Additionally the tex开发者_高级运维t is in an xml utf-8 doc and being displayed from a web interface(made in house).

Escape the $ inside the RegExp, the dollar-sign has a special meaning in RegExp.

In perl, regex's and code are displayed in ascii, but if you want to embed unicode in your text, first you have to have an editor that does unicode, second you have to tell Perl your source code contains unicode (with a use utf8' pragma).

If you don't want to do that you can embed (in Perl) code points in strings (regex's) with a construct like this $regex = /this is some text, this: is \x{1209} a codepoint unicode character/;

It matches the character IF the data source is decoded Unicode (internalized) and contains that character.

Edit - I don't think there is a unicode for canadian dollar, rather '$C', like someone said you have to escape the $ if the regex is interpolated. If you keep the $C, the character class [$C] matches $ or C, not the combination. Maybe (?:\$|\$C) would be a better anchor.

The issue turned out to be a bug in code just before i called eval(). Something in the french unicode was screwing with the code passed to eval, so by not combining the text and regex it worked fine.

继续阅读：php regex utf-8

PCRE seems to be removing particular characters

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？