How do I convert escaped characters into actual special characters in Perl? [duplicate]
Possible Duplicate:
How can I manually interpolate string escapes in a Perl string?
I'm reading a string from a particular file. The problem with it is that it contains escaped characters, like:
Hello!开发者_如何学运维\nI\'d like to tell you a little \"secret\"...
I'd like it to be printed out without escape sequences, like:
Hello!
I'd like to tell you a little "secret".
I thought about removing single backslashes and replacing double with single (since \ is represented as \\), but that doesn't help me with the \n, \t issues and so on. Before trying to fiddle with ugly, complex replace strings I thought I'd ask - maybe Perl has a built-in mechanism for such transformation?
For Perl single character backslash escapes, you can do this safely using a two character eval
as part of the substitution. You need to put in the characters that are acceptable to interpret in the character class after the \
, and then the single character after is eval
'd and inserted into the string.
Consider:
#!/usr/bin/perl
use warnings;
use strict;
print "\n\n\n\n";
while (my $data = <DATA>) {
$data=~s/\\([rnt'"\\])/"qq|\\$1|"/gee;
print $data;
}
__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
A backslask:\\
Tab'\t'stop
line 1\rline 2 (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6
Output:
Hello!
I'd like to tell you a little "secret".
A backslask:\
Tab' 'stop
line 2 (on Unix, "line 1" will get overwritten)
line 3\nline 4 (should result in "line 3\nline 4")
line 5
line 6
The line s/\\([rnt'"\\])/"qq|\\$1|"/gee
does the work.
The
\\([rnt'"\\])
has the acceptable characters to eval inside the braces.The
gee
part does a double eval on the replacement string.The
"qq|\\$1|"
part is eval'd twice. The firsteval
replaces$1
into the string, and the second performs the interpolation.
I cannot think of a two character combination here that would be a security breach...
This method does not deal with the following properly:
Quoted strings. For example, Perl would not unescape the string 'line 1\nline 2' because of the single quotes.
Escapes sequences that are longer than a single character, such as hex
\x1b
or Unicode such as\N{U+...}
or control sequences such as\cD
Anchored escapes, such as \LMAKE LOWER CASE\E or \Umake upper case\E
If you want more complete escape replacement, you can use this regex:
#!/usr/bin/perl
use warnings;
use strict;
print "\n\n\n\n";
binmode STDOUT, ":utf8";
while (my $data = <DATA>) {
$data=~s/\\(
(?:[arnt'"\\]) | # Single char escapes
(?:[ul].) | # uc or lc next char
(?:x[0-9a-fA-F]{2}) | # 2 digit hex escape
(?:x\{[0-9a-fA-F]+\}) | # more than 2 digit hex
(?:\d{2,3}) | # octal
(?:N\{U\+[0-9a-fA-F]{2,4}\}) # unicode by hex
)/"qq|\\$1|"/geex;
print $data;
}
__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
Here is octal: \120
Here is UNICODE: \N{U+0041} and \N{U+41} and \N{U+263D}
Here is a little hex:\x50 \x5fa \x{5fa} \x{263B}
lower case next char \lU \lA
upper case next char \ua \uu
A backslask:\\
Tab'\t'stop
line 1\rline 2 (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6
That handles all Perl escapes except:
Anchored type (\Q, \U, \L ended by \E)
Quoted forms, such as
'don't \n escape in single quotes'
or[not \n in here]
named unicode characters, such as
\N{THAI CHARACTER SO SO}
Control characters like
\cD
(that is easily added...)
But that was not part of your question as I understood it...
I hate to suggest this, but string eval
would solve the problem, but string eval
brings up a host of security and maintenance issues. Where does this data come from? Are there any contracts between the producers of data and you about what the string will hold?
#!/usr/bin/perl
use strict;
use warnings;
while (my $input = <DATA>) {
#note: this only works if # is not allowed as a character in the string
my $string = eval "qq#$input#" or die $@;
print $string;
}
__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
This is bad @{[print "I have pwned you\n"]}.
The other solution is to create a hash that defines all of the escapes you want to implement and do a substitution.
精彩评论