bash: cat the first lines of a file & get position
I got a very big file that contains n lines of text (with n being <1000) at the beginning, an empty line and then lots of untyped binary data.
I would like to extract the first n lines of text, and t开发者_开发技巧hen somehow extract the exact offset of the binary data.
Extracting the first lines is simple, but how can I get the offset? bash is not encoding aware, so just counting up the number of characters is senseless.
grep has an option -b
to output the byte offset.
Example:
$ hexdump -C foo
00000000 66 6f 6f 0a 0a 62 61 72 0a |foo..bar.|
00000009
$ grep -b "^$" foo
4:
$ hexdump -s 5 -C foo
00000005 62 61 72 0a |bar.|
00000009
In the last step I used 5 instead of 4 to skip the newline.
Also works with umlauts (äöü) in the file.
Use grep
to find the empty line
grep -n "^$" your_file | tr -d ':'
Optionally use tail -n 1
if you want the last empty line (that is, if the top part of the file can contain empty lines before the binary stuff starts).
Use head
to get the top part of the file.
head -n $num
you might want to use tools like hexdump or od to retrieve binary offsets instead of bash. Here's a reference.
Perl can tell you where you are in a file:
pos=$( perl -le '
open $fh, "<", $ARGV[0];
$/ = ""; # read the file in "paragraphs"
$first_paragraph = <$fh>;
print tell($fh)
' filename )
Parenthetically, I was attempting to one-liner this
pos=$( perl -00 -lne 'if ($. == 2) {print tell(___what?___); exit}' filename
What is the "current filehandle" variable? I couldn't find it in the docs.
精彩评论