Making a dictionary ... sort of
I have following set of values in a tab delimited file (only a part of values is shown here... file has 2 columns)
MXRA8 9.9074e-08
AURKAIP1 0.0000e+00
CCNL2 1.4962e-07
CCNL2 2.0536e-07
CCNL2 2.5198e-07
CCNL2 2.5311e-07
LOC148413 2.2558e-07
MRPL20 0.0000e+00
LOC441869 0.0000e+00
TMEM88B 0.0000e+00
As evident value for CCNL2 occur 4 times, what I want is that only the highest value for a repeated name in column one should be extracted and put into another file.
Something like this
MXRA8 9.9074e-08
AURKAIP1开发者_开发知识库 0.0000e+00
CCNL2 2.5311e-07
LOC148413 2.2558e-07
MRPL20 0.0000e+00
LOC441869 0.0000e+00
TMEM88B 0.0000e+00
Any suggestions for a bash Shell script script.
Or One liner in PERL
Judging by sorts man-page, it handles floating point values with numeric-sort, but you might want to put this to the test:
sort --key=2 --general-numeric-sort --reverse input.txt | sort --key=1,1 --unique
AURKAIP1 0.0000e+00
CCNL2 2.5311e-07
LOC148413 2.2558e-07
LOC441869 0.0000e+00
MRPL20 0.0000e+00
MXRA8 9.9074e-08
TMEM88B 0.0000e+00
Try:
awk -F '\t' 'BEGIN { } { if (max[$1] == "" || $2 > max[$1]){ max[$1] = $2 } } END { for (var in max) { print var,"\t",max[var] } } ' tab-limited-data-file
That should print out the maximum for each value in the first column.
You could sort first and then use awk to read the file line by line keeping only the max line. If the repeated lines are always grouped as in the sample input, the sort can be avoided.
sort file | awk -F '\t' 'NR==1{last = $1; max = 0} {if (last != $1) {printf "%s\t%e\n", last, max; last = $1; max = $2} else if (max < $2) max = $2} END{printf "%s\t%e\n", last, max}'
精彩评论