开发者

Selecting first nth rows by groups using AWK

I have the following file with 4 fields. There are 3 groups in field 2, and the 4th field consists 0's and 1's.

The first field is just the index.

I like to use AWK to do the following task

  1. Select the first 3 rows of group 1 (Note that group 1 has only 2 rows). The number of rows is based on the number of 1's found in the 4th field times 3.

  2. Select the first 6 rows of group 2. The number of rows is based on the number of 1's found in the 4th field times 3.

  3. Select the first 9 rows of group 3. The number of rows is based 开发者_如何学JAVAon the number of 1's found in the 4th field times 3.

So 17 rows are selected for the output file.

Thank you for your help.

Input 

1   1  TN1148 1
2   1  S52689 0
3   2  TA2081 1
4   2  TA2592 1
5   2  TA4011 0
6   2  TA4246 0
7   2  TA4275 0
8   2  TB0159 0
9   2  TB0392 0
10  3  TB0454 1
11  3  TB0496 1
12  3  TB1181 1
13  3  TC0027 0
14  3  TC1340 0
15  3  TC2247 0
16  3  TC3094 0
17  3  TD0106 0
18  3  TD1146 0
19  3  TD1796 0
20  3  TD3587 0

Output 

 1  1  TN1148 1
 2  1  S52689 0
 3  2  TA2081 1
 4  2  TA2592 1
 5  2  TA4011 0
 6  2  TA4246 0
 7  2  TA4275 0
 8  2  TB0159 0
 10 3  TB0454 1
 11 3  TB0496 1
 12 3  TB1181 1
 13 3  TC0027 0
 14 3  TC1340 0
 15 3  TC2247 0
 16 3  TC3094 0
 17 3  TD0106 0
 18 3  TD1146 0


The key to this awk program is to pass the input file in twice: Once to count how many rows you want and once to print them.

awk '
    NR == FNR {wanted_rows[$2] += 3*$4; next} 
    --wanted_rows[$2] >= 0 {print}
' input_file.txt input_file.txt


#!/usr/bin/awk -f
# by Dennis Williamson - 2010-12-02
# for http://stackoverflow.com/questions/4334167/selecting-first-nth-rows-by-groups-using-awk
$2 == prev {
    count += $4
    groupcount++
    array[idx++] = $0
}
$2 != prev {
    if (NR > 1) {
        for (i=0; i<count*3; i++) {
            if (i == groupcount) break
            print array[i]
        }
    }
    prev = $2
    count = 1
    groupcount = 1
    split("", array) # delete the array
    idx = 0
    array[idx++] = $0
}
END {
    for (i=0; i<count*3; i++) {
        if (i == groupcount) break
        print array[i]
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜