Selecting first nth rows by groups using AWK

2023-01-29 00:48 问答作者：

I have the following file with 4 fields. There are 3 groups in field 2, and the 4th field consists 0's and 1's.

The first field is just the index.

I like to use AWK to do the following task

Select the first 3 rows of group 1 (Note that group 1 has only 2 rows). The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 6 rows of group 2. The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 9 rows of group 3. The number of rows is based 开发者_如何学JAVAon the number of 1's found in the 4th field times 3.

So 17 rows are selected for the output file.

Thank you for your help.

Input 

1   1  TN1148 1
2   1  S52689 0
3   2  TA2081 1
4   2  TA2592 1
5   2  TA4011 0
6   2  TA4246 0
7   2  TA4275 0
8   2  TB0159 0
9   2  TB0392 0
10  3  TB0454 1
11  3  TB0496 1
12  3  TB1181 1
13  3  TC0027 0
14  3  TC1340 0
15  3  TC2247 0
16  3  TC3094 0
17  3  TD0106 0
18  3  TD1146 0
19  3  TD1796 0
20  3  TD3587 0

Output 

 1  1  TN1148 1
 2  1  S52689 0
 3  2  TA2081 1
 4  2  TA2592 1
 5  2  TA4011 0
 6  2  TA4246 0
 7  2  TA4275 0
 8  2  TB0159 0
 10 3  TB0454 1
 11 3  TB0496 1
 12 3  TB1181 1
 13 3  TC0027 0
 14 3  TC1340 0
 15 3  TC2247 0
 16 3  TC3094 0
 17 3  TD0106 0
 18 3  TD1146 0

The key to this awk program is to pass the input file in twice: Once to count how many rows you want and once to print them.

awk '
    NR == FNR {wanted_rows[$2] += 3*$4; next} 
    --wanted_rows[$2] >= 0 {print}
' input_file.txt input_file.txt

#!/usr/bin/awk -f
# by Dennis Williamson - 2010-12-02
# for http://stackoverflow.com/questions/4334167/selecting-first-nth-rows-by-groups-using-awk
$2 == prev {
    count += $4
    groupcount++
    array[idx++] = $0
}
$2 != prev {
    if (NR > 1) {
        for (i=0; i<count*3; i++) {
            if (i == groupcount) break
            print array[i]
        }
    }
    prev = $2
    count = 1
    groupcount = 1
    split("", array) # delete the array
    idx = 0
    array[idx++] = $0
}
END {
    for (i=0; i<count*3; i++) {
        if (i == groupcount) break
        print array[i]
    }
}

Selecting first nth rows by groups using AWK

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？