开发者

Making a dictionary ... sort of

I have following set of values in a tab delimited file (only a part of values is shown here... file has 2 columns)

MXRA8   9.9074e-08
AURKAIP1    0.0000e+00
CCNL2   1.4962e-07
CCNL2   2.0536e-07
CCNL2   2.5198e-07
CCNL2   2.5311e-07
LOC148413   2.2558e-07
MRPL20  0.0000e+00
LOC441869   0.0000e+00
TMEM88B 0.0000e+00 

As evident value for CCNL2 occur 4 times, what I want is that only the highest value for a repeated name in column one should be extracted and put into another file.

Something like this

MXRA8   9.9074e-08 
AURKAIP1开发者_开发知识库    0.0000e+00 
CCNL2   2.5311e-07 
LOC148413   2.2558e-07 
MRPL20  0.0000e+00 
LOC441869   0.0000e+00 
TMEM88B 0.0000e+00 

Any suggestions for a bash Shell script script.

Or One liner in PERL


Judging by sorts man-page, it handles floating point values with numeric-sort, but you might want to put this to the test:

sort --key=2 --general-numeric-sort --reverse input.txt | sort --key=1,1 --unique

AURKAIP1    0.0000e+00
CCNL2   2.5311e-07
LOC148413   2.2558e-07
LOC441869   0.0000e+00
MRPL20  0.0000e+00
MXRA8   9.9074e-08
TMEM88B 0.0000e+00


Try:

awk -F '\t' 'BEGIN { } { if (max[$1] == "" || $2 > max[$1]){ max[$1] = $2 } } END { for (var in max) { print var,"\t",max[var] } } ' tab-limited-data-file

That should print out the maximum for each value in the first column.


You could sort first and then use awk to read the file line by line keeping only the max line. If the repeated lines are always grouped as in the sample input, the sort can be avoided.

sort file | awk -F '\t' 'NR==1{last = $1; max = 0} {if (last != $1) {printf "%s\t%e\n", last, max; last = $1; max = $2} else if (max < $2) max = $2} END{printf "%s\t%e\n", last, max}'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜