Sort associative array with AWK
Here's my array (gawk script) :
myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11
After sort, I need the following result :
bob 5
jack 11
peter 32
john 463
When i use "asort", indices are lost. How to sort by array value without losing indices ? (I need ordered indices based on their values)
(I need to obtain this result with awk/gawk only, not shell script, perl, etc)
If my post isn't clear enough, here is an other post explaining the same issue : http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_26626841.html )
Thanks in advance
Update :
Thanks to you both, but i need to sort by values, not indices (i want ordered indices according to their values).
In other terms, i need this result :
bob 5
jack 11
peter 32
john 463
not :
bob 5
jack 11
john 463
peter 32
(I agree, my example is confusing, the chosen values are pretty bad)
From the code of Catcall, I wrote a quick implementation that works, but it's rather ugly (I concatenate keys & values before sort and split during comparison). Here's what it looks like :
function qsort(A, left, right, i, last) {
if (left >= right)
return
swap(A, left, left+int((right-left+1)*rand()))
last = left
for (i = left+1; i <= right; i++)
if (getPart(A[i], "value") < getPart(A[left], "value"))
swap(A, ++last, i)
swap(A, left, last)
qsort(A, left, last-1)
qsort(A, last+1, right)
}
function swap(A, i, j, t) {
t = A[i]; A[i] = A[j]; A[j] = t
}
function getPart(str, part) {
if (part == "key")
return substr(str, 1, index(str, "#")-1)
if (part == "value")开发者_如何转开发
return substr(str, index(str, "#")+1, length(str))+0
return
}
BEGIN { }
{ }
END {
myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11
for (key in myArray)
sortvalues[j++] = key "#" myArray[key]
qsort(sortvalues, 0, length(myArray));
for (i = 1; i <= length(myArray); i++)
print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}
Of course I'm interested if you have something more clean...
Thanks for your time
Edit:
Sort by values
Oh! To sort the values, it's a bit of a kludge, but you can create a temporary array using a concatenation of the values and the indices of the original array as indices in the new array. Then you can asorti()
the temporary array and split the concatenated values back into indices and values. If you can't follow that convoluted description, the code is much easier to understand. It's also very short.
# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
split(tmpidx[i], tmp, SUBSEP)
indices[++j] = tmp[2] # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
Edit 2:
If you have GAWK 4, you can traverse the array by order of values without performing an explicit sort:
#!/usr/bin/awk -f
BEGIN {
myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11
PROCINFO["sorted_in"] = "@val_num_asc"
for (i in myArray) {
{print i, myArray[i]}}
}
}
There are settings for traversing by index or value, ascending or descending and other options. You can also specify a custom function.
Previous answer:
Sort by indices
If you have an AWK, such as gawk
3.1.2 or greater, which supports asorti()
:
#!/usr/bin/awk -f
BEGIN {
myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11
num = asorti(myArray, indices)
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}
If you don't have asorti()
:
#!/usr/bin/awk -f
BEGIN {
myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11
for (i in myArray) indices[++j] = i
num = asort(indices)
for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}
Use the Unix sort command with the pipe, keeps Awk code simple and follow Unix philosophy
Create a input file with values seperated by comma
peter,32
jack,11
john,463
bob,5
Create a sort.awk file with the code
BEGIN { FS=","; }
{
myArray[$1]=$2;
}
END {
for (name in myArray)
printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}
Run the program, should give you the output
$ awk -f sort.awk data
bob,5
jack,11
peter,32
john,463
PROCINFO["sorted_in"] = "@val_num_desc";
Before iterating an array, use the above statement. But, it works in awk version 4.0.1. It does not work in awk version 3.1.7.
I am not sure in which intermediate version, it got introduced.
And the simple answer...
function sort_by_myArray(i1, v1, i2, v2) {
return myArray[i2] < myArray[i1];
}
BEGIN {
myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;
len = length(myArray);
asorti(myArray, k, "sort_by_myArray");
# Print result.
for(n = 1; n <= len; ++n) {
print k[n], myArray[k[n]]
}
}
Use asorti:
#!/usr/bin/env -S gawk -f
{
score[$1] = $0;
array[sprintf("%3s",$2) $1] = $1;
}
END {
asorti(array, b)
for(i in b)
{
name = array[b[i]]
print score[name]
}
}
The following function works in Gawk 3.1.7 and doesn't resort to any of the workarounds described above (no external sort command, no array subscripts, etc.) It's just a basic implementation of the insertion sort algorithm adapted for associative arrays.
You pass it an associative array to sort on the values and an empty array to populate with the corresponding keys.
myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;
len = resort( myArray, result );
for( i = 1; i <= len; i++ ) {
key = result[ i ];
print i ": " key " = " myArray[ key ];
}
Here is the implementation:
function resort( data, list,
key, len, i, j, v )
{
len = 0;
for( key in data ) {
list[ ++len ] = key;
}
# insertion sort algorithm adapted for
# one-based associative arrays in gawk
for( i = 2; i <= len; i++ )
{
v = list[ i ];
for( j = i - 1; j >= 1; j-- )
{
if( data[ list[ j ] ] <= data[ v ] )
break;
list[ j + 1 ] = list[ j ];
}
list[ j + 1 ] = v;
}
return len;
}
You can "reverse sort" by simply iterating the resulting array in descending order.
精彩评论