Awk selecting data rows

2023-03-22 12:29 问答作者：

I have to process the following datafile using awk:

YEARS:1995:1996:1997:1998:1999:2000
VISITS
Domain1:259:2549:23695:24889:1240:21202
Domain2:32632:87521:147122:22952:2365:121230
Domain3:5985:92104:921744:43124:74234:68350
Domain4:8321:36520:68712:32102:22003:8开发者_StackOverflow2100
SIGNUPS
Domain1:212:202:992:1202:986:3253
Domain2:10401:44522:20103:3595:11410:353
Domain3:3695:23230:452030:25052:9858:3020
Domain4:969:24247:9863:24101:5541:3663

I need to know for each year and domain the total visits and signups. My problem is I can't find a way to select only the first four and the last four rows, can anybody give me some kind of hint on how to achieve that?

Example output (Visits only):

VISITS
Domain1     73834
Domain2     413822
Domain3     1205541
Domain4     309758

        1995    1996    1997    1998    1999    2000
All     47197   218694  1161273 123067  99842   292882

You could match the "VISITS" and "SIGNUPS" rows and set a variable indicating what kinds of records you are processing.

An example:

BEGIN {
    FS = ":";
}
/^YEARS/ {
    for (i = 2 ; i <= NF; i++) {
        year[i] = $i;
    }
    next;
}
/^VISITS/ {
    mode = "VISITS";
    next;
}
/^SIGNUPS/ {
    mode = "SIGNUPS";
    next;
}
{
    for (i = 2; i <= NF; i++) {
        # output "VISITS"/"SIGNUPS", domain, year, value
        print mode, $1, year[i], $i;
    }
}

awk -F: 'END { out( ) }
/^YEARS/ {
  for ( i = 1; ++i <= NF; ) {
    y[i] = $i
    yh = yh ? yh OFS $i : $i
    }
    ny = NF; next   
  }
NF == 1 { 
  m && out( ); m = $1
  }
{
  ym[y[1]] = "ALL:"
  for ( i = 1; ++i <= NF; ) {
    d[$1] += $i; ym[y[i]] += $i
    }   
  } 
func out( ) {
  print m
  for ( D in d ) print D, d[D]
  printf "\n%s\n", OFS yh
  for ( i = 0; ++i <= ny; )
    printf "%s", ( ym[y[i]] ( i < ny ? OFS : RS ) )
  print x; split( x, d ); split( x, ym )  
  }' OFS='\t' infile

With GNU awk you could use:

delete d; delete ym

instead of:

split( x, d ); split( x, ym )

When you say "select only the first four and the last four rows", I assume you mean to process the visits and signups separately:

awk -F: '
$1 == "YEARS"   {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS"  {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits { 
  for (i=2; i<=NF; i++) {
    v_d[$1] += $i     # visits by domain
    v_y[yr[i]] += $i  # visits by year
  }
}
signups {
  for (i=2; i<=NF; i++) {
    s_d[$1] += $i     # signups by domain
    s_y[yr[i]] += $i  # signups by year
  }
}
END {
  OFS=FS
  print "VISITS"
  for (d in v_d) print d, v_d[d]
  for (y in v_y) print y, v_y[y]
  print "SIGNUPS"
  for (d in s_d) print d, s_d[d]
  for (y in s_y) print y, s_y[y]
}'

Given your input, this outputs

VISITS
Domain1:73834
Domain2:413822
Domain3:1205541
Domain4:249758
1999:99842
2000:292882
1995:47197
1996:218694
1997:1161273
1998:123067
SIGNUPS
Domain1:6847
Domain2:90384
Domain3:516885
Domain4:68384
1999:27795
2000:10289
1995:15277
1996:92201
1997:482988
1998:53950

继续阅读：rows selection

Awk selecting data rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？