Using awk printf to urldecode text
I'm using awk
to urldecode some text.
If I code the string into the printf
statement like printf "%s", "\x3D"
it correctly outputs =
. The same if I have the whole escaped string as a variable.
However, if I only have 开发者_JAVA百科the 3D
, how can I append the \x
so printf
will print the =
and not \x3D
?
I'm using busybox awk 1.4.2
and the ash
shell.
I don't know how you do this in awk, but it's trivial in perl:
echo "http://example.com/?q=foo%3Dbar" |
perl -pe 's/\+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig'
Since you're using ash and Perl isn't available, I'm assuming that you may not have gawk
.
For me, using gawk
or busybox awk, your second example works the same as the first (I get "=" from both) unless I use the --posix
option (in which case I get "x3D" for both).
If I use --non-decimal-data
or --traditional
with gawk
I get "=".
What version of AWK are you using (awk
, nawk
, gawk
, busybox - and version number)?
Edit:
You can coerce the variable's string value into a numeric one by adding zero:
~/busybox/awk 'BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0}'
GNU awk
#!/usr/bin/awk -fn
@include "ord"
BEGIN {
RS = "%.."
}
{
printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}
Or
#!/bin/sh
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..
Decoding URL encoding (percent encoding)
This relies on gnu awk's extension of the split function, but this works:
gawk '{ numElems = split($0, arr, /%../, seps);
outStr = ""
for (i = 1; i <= numElems - 1; i++) {
outStr = outStr arr[i]
outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
}
outStr = outStr arr[i]
print outStr
}'
To start with, I'm aware this is an old question, but none of the answers worked for me (restricted to busybox awk)
Two options. To parse stdin:
awk '{for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y));gsub(/%25/, "%");print}'
To take a command line parameter:
awk 'BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}' parameter
Have to do %25 last because otherwise strings like %253D get double-parsed, which shouldn't happen.
The inline check for y==38 is because gsub treats & as a special character unless you backslash it.
This one is the fastest of them all by a large margin and it doesn't need gawk:
#!/usr/bin/mawk -f
function decode_url(url, dec, tmp, pre, mid, rep) {
tmp = url
while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
pre = substr(tmp, 1, RSTART - 1)
mid = substr(tmp, RSTART + 1, RLENGTH - 1)
rep = sprintf("%c", ("0x" mid) + 0)
dec = dec pre rep
tmp = substr(tmp, RSTART + RLENGTH)
}
return dec tmp
}
{
print decode_url($0)
}
Save it as decode_url.awk
and use it like you normally would. E.g:
$ ./decode_url.awk <<< 'Hello%2C%20world%20%21'
Hello, world !
But if you want an even faster version:
#!/usr/bin/mawk -f
function gen_url_decode_array( i, n, c) {
delete decodeArray
for (i = 32; i < 64; ++i) {
c = sprintf("%c", i)
n = sprintf("%%%02X", i)
decodeArray[n] = c
decodeArray[tolower(n)] = c
}
}
function decode_url(url, dec, tmp, pre, mid, rep) {
tmp = url
while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
pre = substr(tmp, 1, RSTART - 1)
mid = substr(tmp, RSTART, RLENGTH)
rep = decodeArray[mid]
dec = dec pre rep
tmp = substr(tmp, RSTART + RLENGTH)
}
return dec tmp
}
BEGIN {
gen_url_decode_array()
}
{
print decode_url($0)
}
Other interpreters than mawk
should have no problem with them.
精彩评论