开发者

Shell: script to group strings by substring

I have a program (sorry changing this is not an o开发者_运维问答ption) that is outputting log files with upwards of 500k lines.

I am trying to group together lines (and then sort these groups) in the log file based on a substring with in the lines

For example I have lines similar to below:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;

what im looking to group on is the TIM BETWEEN '*' AND '*' where * matches between lines for example:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

would be grouped as such in the output:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

with each group also having been sorted based on the whole string so where the "somethings" are similar the are next to each other?

I have been trying to put a shell script together to output what i want reading from a log file but haven't had any success!

Edit: I need to also mention that 'something' can be multiple words for example:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3


You should probably be able to use sort

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile

Where +1 -2 gives the "something" column, +4 -5 gives the first date column and +6 -7 gives the last date column.

(PS! Not tested)


You'll have to pre-filter your data and turn it into something you can use sort with.

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
| sort -t"|" +1 -2 +2 -3 \
| sed 's/|/BETWEEN/;s/|/AND/'

output

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

I hope this helps.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜