开发者

Processing binary data files in bash, finding elements which are greater than some number

I process different binary data. Mostly, these are signed 16-bit streams. With hexdump, it looks like:

...
2150     -191    -262    15      -344    -883    -820    -1038   -780
-1234   -1406   -693    131     433     396     241     600     1280
...

I would like to see only those elements of a data stream, which are greater than or less than some threshold (data is binary signed 16-bit). It could开发者_开发百科 look like:

cat data.pcm | $($here_some_filtering) 2100 -2100

where output must give me only elements which are greater than 2100 and less than -2100. Is there any simple command-line method how to do it?


$ cat pcm
2150     -191    -262    15      -344    -883    -820    -1038   -780
-1234   -1406   -693    131     433     396     241     600     1280

$ for num in $(< pcm); do ((num > 2100 || num < -2100)) && echo $num; done
2150


One liner for this would be something like:

for c in `cat data.pcm`; do if [ $c -lt -2100 -o $c -gt 2100 ]; then echo $c; fi; done


Well, binary ... personal suggestion: Do not use plain old shell - use a tool fit for the job. Perl, Python, even a C/C++ program - it'll be mostly one-liners in those.

The following is an unoptimized hack to give you an idea:

#!/bin/bash
lowerlimit=-333;
upperlimit=333;
filesize=`wc -c "$1" | cut -d' ' -f1`;

off=0;
while [ $off -lt $filesize ]; do
    shortval=$(od -An -s -N 2 -j $off "$1")
    test $shortval -gt $lowerlimit &&
    test $shortval -lt $upperlimit &&
    dd if="$1" bs=1 count=2 skip=$off 2>/dev/null
    off=$(($off + 2))
done

I'm not sure this can be made pipe-able in an easy way because of the fact that the shell uses line separators to split input blocks.


Bash can be made to deal with binary data.

getbyte () {
    local IFS= LC_CTYPE=C res c
    read -r -d '' -n 1 c
    res=$?
    # the single quote in the argument of the printf 
    # yields the numeric value of $c (ASCII since LC_CTYPE=C)
    [[ -n $c ]] && c=$(printf '%d' "'$c") || c=0
    printf "$c"
    return $res
}

filter () {
    local b1 b2 val
    while b1=$(getbyte)
    do
        b2=$(getbyte)
        (( val = b2 * 256 + b1 ))
        (( val = val > 32767 ? val - 65536 : val ))
        if (( val > ${1:-0} || val < ${2:-0} ))
        then
            echo $val
        fi
    done
}

Examples (the data has an odd number of bytes intentionally to show that the function accommodates this condition):

$ data='\0\01\010\0377\0377\0100\0300\0200\0333'
$ echo -en "$data" | filter
256
-248
16639
-32576
219
$ echo -en "$data" | filter 222 -333
256
16639
-32576

Your command would then be:

filter 2100 -2100 < data.pcm


Whenever I want to extract numerical values from a binary file, I use od (octal dump). It has many options for extracting characters, integers (8, 16, 32 and 64 bits) and floats (32 and 64 bits). You can also specify an offset to the exact value that you are looking for.

For learning more about it, type:

man od

Then, filtering on od output should not be complex in bash.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜