开发者

Hexadecimal Variables in substitution patterns

The file I am getting is full with badly formatted UTF-8 codes, like <0308> etc. I can identify them all right, but I want to replace them with the actual utf-8 letter, preferable with a regex. I've tried dozens of regexes like this:

s/<[0-9a-fA-F]{2,4}/\x{$1}/g
s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g

And so on, but each time it tells me that $ is not a valid hex-char (to which I fully agree). Shouldn't开发者_如何学编程 it just take the number in my $1 and put it in there? Or does Perl really expect me to use \x{..} or \N{U+..} only with fixed values? If so, I'd have to hand-write the conversion for every possible hex-value - not very useful.


For one thing, you need to use parentheses to capture something in your regular expression; otherwise $1 will not get set to anything.

chr + hex with eval will do the trick here:

s/ <
   ([0-9a-fA-F]{2,4})     # parentheses to set $1
   > 
 / 
   chr(hex($1)) 
 /gex;        


What version of perl are you using? This seems to work fine for me on 5.10.1:

$ perl -E '$foo = "<0308>"; $foo =~ s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g; say $foo'
Wide character in print at -e line 1.
�>

(With \x{$1}, it seems to substitute the numbers with nothing, but I still don't get an error message.)


You probably need to use the eval switch to it. Try /\x{$1}/eg or /"\x{$1}"/eg

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜