Awk array iteration for multi-dimensional arrays
Awk offers associative indexing for array processing. Elements of 1 dimensional array can be iterated:
e.g.
for(index in arr1)
print "arr1[" index "]=" arr1[index]
But how this kind done for a two dimensional array? Does kind of syntax,given below work?
for(index1 in arr2)
for(index2 in arr2)
开发者_如何学C arr2[index1,index2]
AWK fakes multidimensional arrays by concatenating the indices with the character held in the SUBSEP variable (0x1c). You can iterate through a two-dimensional array using split
like this (based on an example in the info gawk
file):
awk 'BEGIN { OFS=","; array[1,2]=3; array[2,3]=5; array[3,4]=8;
for (comb in array) {split(comb,sep,SUBSEP);
print sep[1], sep[2], array[sep[1],sep[2]]}}'
Output:
2,3,5
3,4,8
1,2,3
You can, however, iterate over a numerically indexed array using nested for loops:
for (i = 1; i <= width; i++)
for (j = 1; j < = height; j++)
print array[i, j]
Another noteworthy bit of information from the GAWK manual:
To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:
(subscript1, subscript2, ...) in array
Gawk 4 adds arrays of arrays. From that link:
for (i in array) {
if (isarray(array[i])) {
for (j in array[i]) {
print array[i][j]
}
}
else
print array[i]
}
Also see Traversing Arrays of Arrays for information about the following function which walks an arbitrarily dimensioned array of arrays, including jagged ones:
function walk_array(arr, name, i)
{
for (i in arr) {
if (isarray(arr[i]))
walk_array(arr[i], (name "[" i "]"))
else
printf("%s[%s] = %s\n", name, i, arr[i])
}
}
No, the syntax
for(index1 in arr2) for(index2 in arr2) {
print arr2[index1][index2];
}
won't work. Awk doesn't truly support multi-dimensional arrays. What it does, if you do something like
x[1,2] = 5;
is to concatenate the two indexes (1 & 2) to make a string, separated by the value of the SUBSEP
variable. If this is equal to "*", then you'd have the same effect as
x["1*2"] = 5;
The default value of SUBSEP
is a non-printing character, corresponding to Ctrl+\. You can see this with the following script:
BEGIN {
x[1,2]=5;
x[2,4]=7;
for (ix in x) {
print ix;
}
}
Running this gives:
% awk -f scriptfile | cat -v
1^\2
2^\4
So, in answer to your question - how to iterate a multi-dimensional array - just use a single for(a in b)
loop, but you may need some extra work to split up a
into its x
and y
parts.
The current versions of gawk (the gnu awk, default in linux, and possible to install everywhere you want), has real multidimensional arrays.
for(b in a)
for(c in a[b])
print a[b][c], c , b
See also function isarray()
I'll provide an example of how I use this in my work processing query data. Suppose you have an extract file full of transactions by product category and customer id:
customer_id category sales
1111 parts 100.01
1212 parts 5.20
2211 screws 1.33
...etc...
Its easy to use awk to count total distinct customers with a purchase:
awk 'NR>1 {a[$1]++} END {for (i in a) total++; print "customers: " total}' \
datafile.txt
However, computing the number of distinct customers with a purchase in each category suggests a two dimensional array:
awk 'NR>1 {a[$2,$1]++}
END {for (i in a) {split(i,arr,SUBSEP); custs[arr[1]]++}
for (k in custs) printf "category: %s customers:%d\n", k, custs[k]}' \
datafile.txt
The increment of custs[arr[1]]++
works because each category/customer_id pair is unique as an index to the associative array used by awk.
In truth, I use gnu awk which is faster and can do array[i][j]
as D. Williamson mentioned. But I wanted to be sure I could do this in standard awk.
awk(1) was originally designed -- in part -- to be teaching tool for the C language, and multi-dimensional arrays have been in both C and awk(1) pretty much forever. as such POSIX IEEE 1003.2 standardized them.
To explore the syntax and semantics, if you create the following file called "test.awk":
BEGIN {
KEY["a"]="a";
KEY["b"]="b";
KEY["c"]="c";
MULTI["a"]["test_a"]="date a";
MULTI["b"]["test_b"]="dbte b";
MULTI["c"]["test_c"]="dcte c";
}
END {
for(k in KEY) {
kk="test_" k ;
print MULTI[k][kk]
}
for(q in MULTI) {
print q
}
for(p in MULTI) {
for( pp in MULTI[p] ) {
print MULTI[p][pp]
}
}
}
and run it with this command:
awk -f test.awk /dev/null
you will get the following output:
date a
dbte b
dcte c
a
b
c
date a
dbte b
dcte c
at least on Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP
精彩评论