开发者

Converting hex to decimal in awk or sed

I have a list of numbers, comma-separated:

123711184642,02,3583090366663629,639f02012437d4
123715942138,01,3538710295145500,639f02afd6c643
123711616258,02,3548370476972758,639f0200485732

I need to split the 3rd column into three as below:

123711184642,02,3583090366663629,639f02,0124,37d4
123715942138,01,3538710295145500,639f02,afd6,c643
123711616258,02,3548370476972758,639f02,0048,5732

And convert the digits in the last two columns into decima开发者_如何学Cl:

123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322


Here's a variation on Jonathan's answer:

awk $([[ $(awk --version) = GNU* ]] && echo --non-decimal-data) -F, '
    BEGIN {OFS = FS}
    {
        $6 = sprintf("%d", "0x" substr($4, 11, 4))
        $5 = sprintf("%d", "0x" substr($4,  7, 4))
        $4 = substr($4,  1, 6)
        print
    }'

I included a rather contorted way of adding the --non-decimal-data option if it's needed.

Edit

Just for the heck of it, here's the pure-Bash equivalent:

saveIFS=$IFS
IFS=,
while read -r -a line
do
    printf '%s,%s,%d,%d\n' "${line[*]:0:3}" "${line[3]:0:6}" "0x${line[3]:6:4}" "0x${line[3]:10:4}"
done
IFS=$saveIFS

The "${line[*]:0:3}" (quoted *) works similarly to AWK's OFS in that it causes Bash's IFS (here a comma) to be inserted between array elements on output. We can take further advantage of that feature by inserting array elements as follows which more closely parallels my AWK version above.

saveIFS=$IFS
IFS=,
while read -r -a line
do
    line[6]=$(printf '%d' "0x${line[3]:10:4}")
    line[5]=$(printf '%d' "0x${line[3]:6:4}")
    line[4]=$(printf '%s' "${line[3]:0:6}")
    printf '%s\n' "${line[*]}"
done
IFS=$saveIFS

Unfortunately, Bash doesn't allow printf -v (which is similar to sprintf()) to make assignments to array elements, so printf -v "line[6]" ... doesn't work.

Edit: As of Bash 4.1, printf -v can now make assignments to array elements. Example:

printf -v 'line[6]' '%d' "0x${line[3]:10:4}"

The quotes around the array reference are needed to prevent possible filename matching. If a file named "line6" existed in the current directory and the reference wasn't quoted, then a variable named line6 would be created (or updated) containing the printf output. Nothing else about the file, such as its contents, would come into play. Only the name - and only tangentially.


Foreword

In this answer I address converting hex numbers by AWK in general, not specifically in the case of the question.

In the following examples the first field (i.e. $1) of each record given to the interpreter is converted. Only hexadecimal digits are allowed in the input, not the "0x" prefix.

By GNU Awk arbitrary great hex values can be converted simply

If gawk is complied to use the GNU MPFR and GMP libraries, it can do arbitrary precision arithmetic numbers, when option -M is used.

gawk -M '{print strtonum("0x" $1)}'

By AWK portably

Using --non-decimal-data for gawk is not recommended according to GNU Awk User's Guide. And also using strtonum() is not portable but it is supported by gawk only as far as I know. So lets look at alternatives:

By user-defined function

Supposedly the most portable way of doing conversion is by a user-defined awk function [reference]:

function parsehex(V,OUT)
{
    if(V ~ /^0x/)  V=substr(V,3);

    for(N=1; N<=length(V); N++)
        OUT=(OUT*16) + H[substr(V, N, 1)]

    return(OUT)
}

BEGIN { for(N=0; N<16; N++)
        {  H[sprintf("%x",N)]=N; H[sprintf("%X",N)]=N } }

{ print parsehex($1) }

Note: You can convert greater hex numbers by replacing return(OUT) by return(sprintf("%.0f", OUT)), if your AWK interpreter does support only 32-bit integers; I could convert 0xFFFFFFFFFFFFF = 2^52-1 this way. The function ignores possible "0x" prefix.

By calling shell's printf

You could use this

awk '{cmd="printf %d 0x" $1; cmd | getline decimal; close(cmd); print decimal}'

but it is relatively slow as it requires starting a subshell. The following one is faster, if you have many newline-separated hexadecimal numbers to convert:

awk 'BEGIN{cmd="printf \"%d\n\""}{cmd=cmd " 0x" $1}END{while ((cmd | getline dec) > 0) { print dec }; close(cmd)}'

There might be a problem if very many arguments are added for the single printf command.

Also these methods have limitation on how large hex numbers they can convert. I could convert 0xFFFFFFFFFFFFFFF = 2^60-1 in my system.

By using AWK's printf (or sprintf)

In my experience the following works in Linux:

awk -Wposix '{ printf "%d\n", "0x" $1 }'

I tested it by gawk, mawk and original-awk in Ubuntu Linux 20.04. gawk requires -Wposix here. original-awk displays a warning message about the option, but you can hide it by redirection directive 2>/dev/null in shell. If you don't want to do that, you can make it use -Wposix merely with GNU Awk like this:

awk -Wversion 2>/dev/null | ( unset -v IFS; read -r word _; [ "$word" = GNU ] && exit 0 || exit 1 ) && posix_option="-Wposix" || posix_option=""
awk $posix_option '{ printf "%d\n", "0x" $1 }'

Note: Yet again implementation or your interpreter does limit the maximum hex value that can be converted by this way. E.g. mawk in my system has maximum-integer 2147483647; this is told in standard error output of mawk -Wversion (at least for version 1.3.4). You can convert greater hex numbers by replacing printf "%d\n", "0x" $1 by printf "%.0f\n", "0x" $1; I could convert 0xFFFFFFFFFFFFF = 2^52-1 this way.


This seems to work:

awk -F, '{ p1 =       substr($4,  1, 6);
           p2 = ("0x" substr($4,  7, 4)) + 0;
           p3 = ("0x" substr($4, 11, 4)) + 0;
           printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, p2, p3;
         }'

For your sample input data, it produces:

123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322

The string concatenation of '0x' plus the 4-digit hex followed by adding 0 forces awk to treat the numbers as hexadecimals.

You can simplify this to:

awk -F, '{ p1 =      substr($4,  1, 6);
           p2 = "0x" substr($4,  7, 4);
           p3 = "0x" substr($4, 11, 4);
           printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, p2, p3;
         }'

The strings prefixed with 0x are forced to integer when presented to printf() and the %d format.


The code above works beautifully with the native awk on MacOS X 10.6.5 (version 20070501); sadly, it does not work with GNU gawk 3.1.7. That, it seems, is permitted behaviour according to POSIX (see the comments below). However, gawk has a non-standard function strtonum that can be used to bludgeon it into performing correctly - pity that bludgeoning is necessary.

gawk -F, '{ p1 =      substr($4,  1, 6);
            p2 = "0x" substr($4,  7, 4);
            p3 = "0x" substr($4, 11, 4);
            printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, strtonum(p2), strtonum(p3);
          }'


printf "%d\n", strtonum( "0x"$1 )"


This might work for you (GNU sed & printf):

sed -r 's/(....)(....)$/ 0x\1 0x\2/;s/.*/printf "%s,%d,%d" &/e' file

Split the last eight characters and add spaces preceeding the fields by the hex identifier and then evaluate the whole line using printf.


cat all_info_List.csv| awk 'BEGIN {FS="|"}{print $21}'| awk 'BEGIN {FS=":"}{p1=$1":"$2":"$3":"$4":"$5":";  p2 = strtonum("0x"$6); printf("%s%02X\n",p1,p2+1) }'

The above command prints the contents of "all_info_List.csv", a file where the field separator is "|". Then takes field 21 (MAC address) and splits it using field separator ":". It assigns to variable "p1" the first 5 bytes of each mac address, so if we had this mac address:"11:22:33:44:55:66", p1 would be: "11:22:33:44:55:". p2 is assigned with the decimal value of the last byte: "0x66" would assign "102" decimal to p2. Finally, I'm using printf to join p1 and p2, while converting p2 back to hex, after adding one to it.


--- My 5 Cents

I just want to add my 5 Cents in case this topic is still of interest. From the comments in the thread I take it, there still is. Hope it helps:

Challenge: Convert a hex number to decimal on a Apple M1 laptop running the latest MacOS (2022) With the following versions on MacOS

% uname -a
Darwin macbook 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct  9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64 arm Darwin

% gawk --version
GNU Awk 5.2.1, API 3.2, (GNU MPFR 4.1.0-p13, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2022 Free Software Foundation.

--- gawk -Wposix needed

% echo "116B" | gawk '{p = ("0x" substr($1, 1, 4)) +0; printf("%d\n", p )}'
0

% echo "116B" | gawk -Wposix '{p = ("0x" substr($1, 1, 4)) +0; printf("%d\n", p )}'
4459

--- Some simplifications also work

% echo "116B" | gawk -Wposix '{p = "0x" substr($1, 1, 4); printf("%d\n", p )}'
4459

% echo "116B" | gawk -Wposix '{printf("%d\n", "0x" substr($1, 1, 4))}'
4459

--- Checking...

% echo "4459" | gawk '{printf("%X\n", $1 )}'
116B

--- This form is what I was looking for

% echo "00:11:6BX" | gawk -Wposix '{printf("%d\n", "0x" substr($1, 1, 2) substr($1, 4, 2) substr($1, 7, 2))}'
4459


this should be a cleaner approach than perl python or printf :

echo 0x7E07E30EAAC59DB8EB9FDAD2EE818EA7AEB70192DAE552AD06B9FE
       593BE89BC258483EA07C972B0FE7BA0D7B6CAC6DF338571F49CABB
       DD195629411CDF0F88858EC39F01AE181E60A4F0DAF5F4F0E86991
       82243BDF159AB588F11E3FF68E799509128EA7BA957B62DF103D0E
       B2C3195DA1CCDFDD0CAF0E9958C1AF3E2B6993AA74C255B711BE38
       DB031B26A596EFE19051A864000FB99F161923F12C2F9F40F18B6E
       064CCCAE4C0776D0EB815947A30AB68B1CF12CA6622CAECA530221
       2C27FD1579178363FE2E87B1F02FC0FDFFF | 
gawk -nMbe '$++NF = +$!_' OFS='\n\n' 
 1  0x7E07E30EAAC59DB8EB9FDAD2EE818EA7AEB70192DAE552AD06B9FE
      593BE89BC258483EA07C972B0FE7BA0D7B6CAC6DF338571F49CABB
      DD195629411CDF0F88858EC39F01AE181E60A4F0DAF5F4F0E86991
      82243BDF159AB588F11E3FF68E799509128EA7BA957B62DF103D0E
      B2C3195DA1CCDFDD0CAF0E9958C1AF3E2B6993AA74C255B711BE38
      DB031B26A596EFE19051A864000FB99F161923F12C2F9F40F18B6E
      064CCCAE4C0776D0EB815947A30AB68B1CF12CA6622CAECA530221
      2C27FD1579178363FE2E87B1F02FC0FDFFF

 2  985801769662049290799836483751359680713382803597807741
      342261221390727037343867491391068497002991150267570021
      888625408701957708383236015057159917981445085171196540
      056449671723413767151987807183076995694938175592905407
      706727043644590485574826597324100590757487981303537403
      481578192766548120367625144822345612103264180960846560
      558546717739085751660018602037450619797709845938562717
      870137791128285871274530893277287577788311030033741131
      093413810677239057304751530532826551215693481438241043
      55789791231

in case you're wondering, this number is a Mersenne prime to the power of another Mersenne prime :

8191 ^ 127 

And the 2 primes closest to it should be

  •  8191 ^ 127 - ( 16 + 512 )
    
  •  8191 ^ 127 + (     1450 )
    


Perl version, with a tip of the hat to @Jonathan:

perl -F, -lane '$p1 = substr($F[3], 0, 6); $p2 = substr($F[3], 6, 4); $p3 = substr($F[3], 10, 4); printf "%s,%s,%s,%s,%d,%d\n", @F[0..2], $p1, hex($p2), hex($p3)' file

-a turn on autosplit mode, to populate the @F array
-F, changes the autosplit separator to , (default is whitespace)
The substr() indices are 1 less than their awk equivalents, since Perl arrays start from 0.

Output:

123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜