How to do "decode 'unicode-escape'" only on \xhh characters in a string in Perl?

2023-02-05 15:23 问答作者：

I have a file with the following content with some characters are UTF-8 hex encoded in the string literal:

<root>
<element type=\"1\">\"Hello W\xC3\x96rld\"</element>
</root>

I want to read the file and decode the UTF-8 hex encoded characters in the file to the actual unicode characters they represent and then write to a new file. Given the above content, the new file should look like the following when you open it in a text editor with UTF-8 encoding:

<root>
<element type=\"1\">\"Hello WÖrld\"</element>
</root>

Notice the double quotes are still escaped and the UTF-8 hex encoded \xC3\x96 has now become Ö (U+00D6 LATIN CAPITAL LETTER O WITH DIAERESIS).

I have got code that is partially working, as follows:

#! /usr/bin/perl -w

use strict;
use Encode::Escape;

while (<>)
{
    # STDOUT is redirected to a new file.
    print decode 'unicode-escape', $_;
}

The problem however, all the other escape sequences such as \" are being decoded as well by decode 'unicode-escape', $_. So in the end, I get the following:

<root>
<element type="1">"Hello WÖrld"</element>
</root>

I have tried reading the file in UT开发者_JS百科F-8 encoding and/or using Unicode::Escape::unescape such as

open(my $UNICODESFILE, "<:encoding(UTF-8)", shift(@ARGV));
Unicode::Escape::unescape($line);

but neither of them decode the \xhh escape sequences.

Basically all I want is the behavior of decode 'unicode-escape', $_, but that it should only decode on \xhh escape sequences and ignore other escape sequences.

Is this possible? Is using decode 'unicode-escape', $_ appropriate for this case? Any other way? Thanks!

Find groups of \xNN characters and process them, I guess:

s{((?:\\x[0-9A-Fa-f]{2})+)}{decode 'unicode-escape', $1}ge

继续阅读：decoding encoding perl unicode-escapes utf-8

How to do "decode 'unicode-escape'" only on \xhh characters in a string in Perl?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？