How to process duplicate columns with conditions
I need to skip the all the r开发者_C百科ows with same column one, if column 2 is empty and then for others I need to calculate percentage of column 4 over column 3?
Input:
T75PA 2 0
T75PA kk 4 1
T240P 4 3
T240P test 3 3
T240P test2 3 1
T245P rr 8 1
T245P rr 33 1
T226PA fg 4 2
T226PA g 51 38
T226PA e 41 34
Output
T245P rr 8 1 0.125
T245P rr 33 1 0.03030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098039
T226PA e 41 34 0.829268293
awk '
NR==FNR {if (NF < 4) blank[$1]; next}
$1 in blank {next}
{$(NF+1) = $4/$3; print}
' datafile datafile | column -t
Since you say now that the field separator is tab:
awk '
BEGIN {OFS = FS = "\t"}
NR==FNR {if ($2 == "") blank[$1]; next}
$1 in blank {next}
{$5 = $4/$3; print}
' datafile datafile
I'll assume your data is tab seperated. A perl script something like this (I haven't tested it)...
my @data;
my %counts;
my %blanks;
while( my $line = <STDIN> )
{
chop($line);
my @rec = split( "\t", $line );
push( @data, \@rec );
$counts{$rec[0]}++;
if( $rec[1] eq '' )
{
$blanks{$rec[0]}++;
}
}
foreach my $rec ( @data )
{
if( $counts{$rec->[0]} <= 1 || !$blanks{$rec->[0]} )
{
print join( "\t", @$rec, $rec->[3] / $rec->[2] ) . "\n";
}
}
How about:
#!/usr/bin/perl
use Modern::Perl;
my $re = qr/^([A-Z0-9]+)\s+?(\S+|\s+)\s+(\d+)\s+(\d+)\s*$/;
my $skip = '';
while (<DATA>) {
chomp;
if (my @l = $_ =~ /$re/) {
if ($l[1] =~ /^\s+$/ || $skip eq $l[0]) {
$skip = $l[0];
next;
}
$skip = '';
my $r = $l[3] / $l[2];
say "$_\t$r";
}
}
__DATA__
T75PA 2 0
T75PA kk 4 1
T240P 4 3
T240P test 3 3
T240P test2 3 1
T245P rr 8 1
T245P rr 33 1
T226PA fg 4 2
T226PA g 51 38
T226PA e 41 34
output:
T245P rr 8 1 0.125
T245P rr 33 1 0.0303030303030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098039215686
T226PA e 41 34 0.829268292682927
try:
awk '$2 ~ /[0-9]+/{for(i in res){if ($1 ~ res[i])delete res[i]};\
rm[$1]=$1;next}\
{if($1 in rm)next;ratio=$4/$3;res[NR]=$0"\t"ratio}\
END{for (i in res)print res[i]}' file
This will ignore all lines with fewer than four entries, for all other entries the ration is calculated and concatenated with the entrie and saved in the array res. After processing the file, the entries of res are printed to stdout.
Output:
T245P rr 8 1 0.125
T245P rr 33 1 0.030303
T226PA fg 4 2 0.5
T226PA g 51 38 0.745098
T226PA e 41 34 0.829268
HTH Chris
精彩评论