Perl precompiled regex - utf8

2023-03-07 12:14 问答作者：

When I do:

use strict; use warnings;
my $regex = qr/[[:upper:]]/;
my $line = MyModule::get_my_line_from_external_source(); #file, db, etc...
print "upper here\n" if( $line =~ $regex );

How perl will know when it must match only ascii uppercase and when utf8 uppercase? It is an precompiled regex - so somewhat perl must know, what is uppercase. Dependent on locale settings? If yes, how to match utf8 uppercase in "C" locale with precompiled regex?

updated based on tchrist's comments:

use strict; use warnings; use Encode;
my $regex = qr/[[:upper:]]/;

my $line = XXX::开发者_C百科line();
print "$line: upper1 ", ($line =~ $regex) ? "YES" : "NO", "\n";

my $uline = Encode::decode_utf8($line);
print "$uline: upper2 ", ($uline =~ $regex) ? "YES" : "NO", "\n";

package XXX;
sub line { return "alpha-Ω"; } #returning octets - not utf8 chars

The output is:

alpha-Ω: upper1 NO
alpha-Ω: upper2 YES

What does it mean, that the precompiled regex is not 'hard-precompiled' but 'soft-precompiled' - so perl replace '[[:upper:]]' based on the utf8 flag of the matched $line.

Before Perl 5.14, this was not very well defined.

With 5.14, the pattern known how it was compiled, and you have the /u, /l, /d, /a, or /aa pattern modifiers. You can also say

use re "/u";

use re "/msu";

to turn all those flags on in the lexical scope.

For example, under 5.14:

% perl -le 'print qr/foo/'
(?^:foo)
% perl -E 'say qr/foo/'
(?^u:foo)
% perl -E 'say qr/foo/l'
(?^l:foo)

I would stear clear of locales; just use all-Unicode.

BTW, I would make darned sure that that "external source" gave you back a string that was properly decoded; that is, has its UTF8 flag turned on. Character functions work poorly on encoded strings, because they really want decoded strings instead.

继续阅读：perl regex unicode utf-8

Perl precompiled regex - utf8

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？