combine two lists with a join on a column
I'm trying to combine two lists, joining them by a common field suchs as ENST00000371026. I've tried the following but no luck. What is the actual way to do it?
cat> gar1.txt <<EOF
ENST00000371026 ENSG00000152763
ENST00000371023 ENSG00000152763
ENST00000395250 ENSG00000152763
ENST00000309502 ENSG00000163485
ENST00000377464 ENSG00000142599
ENST00000400908 ENSG00000142599
ENST00000337907 ENSG00000142599
ENST00000400907 ENSG00000142599
ENST00000401087 ENSG00000179571
EOF
cat> gar2.txt <<EOF
DDX11L1 ENST00000371026
DDX11L9 ENST00000309502
DDX11L1 ENST00000371026
OR4F5 ENST00000377464
DQ597235 n/a
DQ599768 n/a
LOC388312 ENST00000401087
LOC100132287 ENST00000425496
LOC100132287 ENST00000425496
EOF
join -t"\t" -2 2 -1 1 gar1.txt gar2.txt
An example output would be:
DDX11L1 ENSG00000152763
DDX11L9 ENSG00000163485
OR4F5 ENSG00000142599
LOC388312 ENSG00000179571
LOC100132开发者_如何学Python287 NONE
Give this a try:
join -a 2 -t $'\t' -2 2 -1 1 -o 2.1 1.2 <(sort gar1.txt) <(sort -k 2,2 gar2.txt) | sed '/\t$/ s/$/NONE/' | uniq
It uses Bash process substitution (<()
), but you could simply presort your files if you don't want to use that. Your version of join
needs to have the -o
option or you could use awk to process the output further. For the "n/a" lines it will print "NONE", but you could do this to eliminate them:
<(sort -k 2,2 gar2.txt | grep -v 'n/a$')
Edit:
This is the output I get without the sed
and uniq
:
DDX11L9 ENSG00000163485
DDX11L1 ENSG00000152763
DDX11L1 ENSG00000152763
OR4F5 ENSG00000142599
LOC388312 ENSG00000179571
LOC100132287
LOC100132287
DQ597235
DQ599768
And this is what I get with them:
DDX11L9 ENSG00000163485
DDX11L1 ENSG00000152763
OR4F5 ENSG00000142599
LOC388312 ENSG00000179571
LOC100132287 NONE
DQ597235 NONE
DQ599768 NONE
Adding the grep
:
DDX11L9 ENSG00000163485
DDX11L1 ENSG00000152763
OR4F5 ENSG00000142599
LOC388312 ENSG00000179571
LOC100132287 NONE
which matches your example output except for the order. If you need to preserve the order it would be possible to do.
精彩评论