Processing binary data files in bash, finding elements which are greater than some number
I process different binary data. Mostly, these are signed 16-bit streams. With hexdump, it looks like:
...
2150 -191 -262 15 -344 -883 -820 -1038 -780
-1234 -1406 -693 131 433 396 241 600 1280
...
I would like to see only those elements of a data stream, which are greater than or less than some threshold (data is binary signed 16-bit). It could开发者_开发百科 look like:
cat data.pcm | $($here_some_filtering) 2100 -2100
where output must give me only elements which are greater than 2100 and less than -2100. Is there any simple command-line method how to do it?
$ cat pcm
2150 -191 -262 15 -344 -883 -820 -1038 -780
-1234 -1406 -693 131 433 396 241 600 1280
$ for num in $(< pcm); do ((num > 2100 || num < -2100)) && echo $num; done
2150
One liner for this would be something like:
for c in `cat data.pcm`; do if [ $c -lt -2100 -o $c -gt 2100 ]; then echo $c; fi; done
Well, binary ... personal suggestion: Do not use plain old shell - use a tool fit for the job. Perl, Python, even a C/C++ program - it'll be mostly one-liners in those.
The following is an unoptimized hack to give you an idea:
#!/bin/bash
lowerlimit=-333;
upperlimit=333;
filesize=`wc -c "$1" | cut -d' ' -f1`;
off=0;
while [ $off -lt $filesize ]; do
shortval=$(od -An -s -N 2 -j $off "$1")
test $shortval -gt $lowerlimit &&
test $shortval -lt $upperlimit &&
dd if="$1" bs=1 count=2 skip=$off 2>/dev/null
off=$(($off + 2))
done
I'm not sure this can be made pipe-able in an easy way because of the fact that the shell uses line separators to split input blocks.
Bash can be made to deal with binary data.
getbyte () {
local IFS= LC_CTYPE=C res c
read -r -d '' -n 1 c
res=$?
# the single quote in the argument of the printf
# yields the numeric value of $c (ASCII since LC_CTYPE=C)
[[ -n $c ]] && c=$(printf '%d' "'$c") || c=0
printf "$c"
return $res
}
filter () {
local b1 b2 val
while b1=$(getbyte)
do
b2=$(getbyte)
(( val = b2 * 256 + b1 ))
(( val = val > 32767 ? val - 65536 : val ))
if (( val > ${1:-0} || val < ${2:-0} ))
then
echo $val
fi
done
}
Examples (the data has an odd number of bytes intentionally to show that the function accommodates this condition):
$ data='\0\01\010\0377\0377\0100\0300\0200\0333'
$ echo -en "$data" | filter
256
-248
16639
-32576
219
$ echo -en "$data" | filter 222 -333
256
16639
-32576
Your command would then be:
filter 2100 -2100 < data.pcm
Whenever I want to extract numerical values from a binary file, I use od
(octal dump). It has many options for extracting characters, integers (8, 16, 32 and 64 bits) and floats (32 and 64 bits). You can also specify an offset to the exact value that you are looking for.
For learning more about it, type:
man od
Then, filtering on od
output should not be complex in bash.
精彩评论