开发者

Unix script to search within a compressed .gz file

I want to get a few lines from a file which is in a compressed .gz file.

The .gz file contains many txt files and I want to search a string in all these txt files and need to get the previous 3 line as output, including the current line (where the search开发者_开发知识库 string is present).

I tried zgrep and got the line number, but when I use head or tail command It's giving some garbage values. I think we cannot use the head or tail commands with compressed files containing multiple files.

Please suggest if there is any simple way?


The essence of how to accomplish this is to get the names of the files within the tarball to search over, and extract their content to be searched, while not extracting anything else. Because we don't want to write to the file system, we can use the -O flag to instead extract to standard-out.

tar -tzf file.tar.gz | grep '\.txt' | xargs tar -Oxzf file.tar.gz | grep -B 3 "string-or-regex" will concatenate all of the files in the .tar.gz with names ending in ".txt", and grep them for the given string, also outputting the 3 previous lines. It won't tell you which file in the tarball any match came from, and the "three previous lines" may in fact come from the previous file.

You can instead do:

for file in $(tar -tzf file.tar.gz | grep '\.txt'); do 
    tar -Oxzf file.tar.gz "$file" | grep -B 3 --label="$file" -H "string-or-regex"
done

which will respect file boundaries, and report the file names, but be much less efficient.

(-z tells tar it is gzip compressed. -t lists the contents. -x extracts. -O redirects to standard output rather than the file system. Older tars may not have the -O or -z flag, and will want the flags without -: e.g. tar tz file.tar.gz)

Okay, so you have an unusable grep. We can fix that with awk!

#!/usr/bin/awk -f
BEGIN { context=3; }
{ add_buffer($0) }
/pattern/ { print_buffer() }
function add_buffer(line)
{
    buffer[NR % context]=line
}
function print_buffer()
{
    for(i = max(1, NR-context+1); i <= NR; i++) {
        print buffer[i % context]
    }
}
function max(a,b)
{
    if (a > b) { return a } else { return b }
}

This will not coalesce adjacent matches, unlike grep -B, and can thus repeat lines that are within 3 lines of two different matches.


Is that maybe a gzip of a tar file? The simplest is just extract the whole thing and use the regular tools on the extracted files.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜