Awk selecting data rows
I have to process the following datafile using awk:
YEARS:1995:1996:1997:1998:1999:2000
VISITS
Domain1:259:2549:23695:24889:1240:21202
Domain2:32632:87521:147122:22952:2365:121230
Domain3:5985:92104:921744:43124:74234:68350
Domain4:8321:36520:68712:32102:22003:8开发者_StackOverflow2100
SIGNUPS
Domain1:212:202:992:1202:986:3253
Domain2:10401:44522:20103:3595:11410:353
Domain3:3695:23230:452030:25052:9858:3020
Domain4:969:24247:9863:24101:5541:3663
I need to know for each year and domain the total visits and signups. My problem is I can't find a way to select only the first four and the last four rows, can anybody give me some kind of hint on how to achieve that?
Example output (Visits only):
VISITS
Domain1 73834
Domain2 413822
Domain3 1205541
Domain4 309758
1995 1996 1997 1998 1999 2000
All 47197 218694 1161273 123067 99842 292882
You could match the "VISITS" and "SIGNUPS" rows and set a variable indicating what kinds of records you are processing.
An example:
BEGIN {
FS = ":";
}
/^YEARS/ {
for (i = 2 ; i <= NF; i++) {
year[i] = $i;
}
next;
}
/^VISITS/ {
mode = "VISITS";
next;
}
/^SIGNUPS/ {
mode = "SIGNUPS";
next;
}
{
for (i = 2; i <= NF; i++) {
# output "VISITS"/"SIGNUPS", domain, year, value
print mode, $1, year[i], $i;
}
}
awk -F: 'END { out( ) }
/^YEARS/ {
for ( i = 1; ++i <= NF; ) {
y[i] = $i
yh = yh ? yh OFS $i : $i
}
ny = NF; next
}
NF == 1 {
m && out( ); m = $1
}
{
ym[y[1]] = "ALL:"
for ( i = 1; ++i <= NF; ) {
d[$1] += $i; ym[y[i]] += $i
}
}
func out( ) {
print m
for ( D in d ) print D, d[D]
printf "\n%s\n", OFS yh
for ( i = 0; ++i <= ny; )
printf "%s", ( ym[y[i]] ( i < ny ? OFS : RS ) )
print x; split( x, d ); split( x, ym )
}' OFS='\t' infile
With GNU awk you could use:
delete d; delete ym
instead of:
split( x, d ); split( x, ym )
When you say "select only the first four and the last four rows", I assume you mean to process the visits and signups separately:
awk -F: '
$1 == "YEARS" {for (i=2; i<=NF; i++) {yr[i] = $i}; next}
$1 == "VISITS" {visits = 1; signups = 0; next}
$1 == "SIGNUPS" {visits = 0; signups = 1; next}
visits {
for (i=2; i<=NF; i++) {
v_d[$1] += $i # visits by domain
v_y[yr[i]] += $i # visits by year
}
}
signups {
for (i=2; i<=NF; i++) {
s_d[$1] += $i # signups by domain
s_y[yr[i]] += $i # signups by year
}
}
END {
OFS=FS
print "VISITS"
for (d in v_d) print d, v_d[d]
for (y in v_y) print y, v_y[y]
print "SIGNUPS"
for (d in s_d) print d, s_d[d]
for (y in s_y) print y, s_y[y]
}'
Given your input, this outputs
VISITS
Domain1:73834
Domain2:413822
Domain3:1205541
Domain4:249758
1999:99842
2000:292882
1995:47197
1996:218694
1997:1161273
1998:123067
SIGNUPS
Domain1:6847
Domain2:90384
Domain3:516885
Domain4:68384
1999:27795
2000:10289
1995:15277
1996:92201
1997:482988
1998:53950
精彩评论