开发者

bash: cat the first lines of a file & get position

I got a very big file that contains n lines of text (with n being <1000) at the beginning, an empty line and then lots of untyped binary data.

I would like to extract the first n lines of text, and t开发者_开发技巧hen somehow extract the exact offset of the binary data.

Extracting the first lines is simple, but how can I get the offset? bash is not encoding aware, so just counting up the number of characters is senseless.


grep has an option -b to output the byte offset.

Example:

$ hexdump -C foo 
00000000  66 6f 6f 0a 0a 62 61 72  0a                       |foo..bar.|
00000009
$ grep -b "^$" foo 
4:
$ hexdump -s 5 -C foo
00000005  62 61 72 0a                                       |bar.|
00000009

In the last step I used 5 instead of 4 to skip the newline.

Also works with umlauts (äöü) in the file.


Use grep to find the empty line

grep -n "^$" your_file | tr -d ':'

Optionally use tail -n 1 if you want the last empty line (that is, if the top part of the file can contain empty lines before the binary stuff starts).

Use head to get the top part of the file.

head -n $num


you might want to use tools like hexdump or od to retrieve binary offsets instead of bash. Here's a reference.


Perl can tell you where you are in a file:

pos=$( perl -le '
    open $fh, "<", $ARGV[0]; 
    $/ = "";  # read the file in "paragraphs" 
    $first_paragraph = <$fh>; 
    print tell($fh)
' filename )

Parenthetically, I was attempting to one-liner this

pos=$( perl -00 -lne 'if ($. == 2) {print tell(___what?___); exit}' filename

What is the "current filehandle" variable? I couldn't find it in the docs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜