Hexadecimal Variables in substitution patterns
The file I am getting is full with badly formatted UTF-8 codes, like <0308>
etc. I can identify them all right, but I want to replace them with the actual utf-8 letter, preferable with a regex. I've tried dozens of regexes like this:
s/<[0-9a-fA-F]{2,4}/\x{$1}/g
s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g
And so on, but each time it tells me that $
is not a valid hex-char (to which I fully agree). Shouldn't开发者_如何学编程 it just take the number in my $1
and put it in there? Or does Perl really expect me to use \x{..}
or \N{U+..}
only with fixed values? If so, I'd have to hand-write the conversion for every possible hex-value - not very useful.
For one thing, you need to use parentheses to capture something in your regular expression; otherwise $1
will not get set to anything.
chr
+ hex
with eval will do the trick here:
s/ <
([0-9a-fA-F]{2,4}) # parentheses to set $1
>
/
chr(hex($1))
/gex;
What version of perl
are you using? This seems to work fine for me on 5.10.1:
$ perl -E '$foo = "<0308>"; $foo =~ s/<[0-9a-fA-F]{2,4}/\N{U+$1}/g; say $foo'
Wide character in print at -e line 1.
�>
(With \x{$1}
, it seems to substitute the numbers with nothing, but I still don't get an error message.)
You probably need to use the eval switch to it. Try /\x{$1}/eg
or /"\x{$1}"/eg
精彩评论