How can I apply Unix's / Sed's / Perl's transliterate (tr) to only a specific column?
I have program output that looks like this (tab delim):
$ ./mycode somefile
0000000000000000000000000000000000 238671
0000000000000000000000000000000001 0
0000000000000000000000000000000002 0
0000000开发者_JAVA技巧000000000000000000000000003 0
0000000000000000000000000000000010 0
0000000000000000000000000000000011 1548.81
0000000000000000000000000000000012 0
0000000000000000000000000000000013 937.306
What I want to do is on FIRST column only: replace 0 with A, 1 with C, 2 with G, and 3 with T. Is there a way I can transliterate that output piped directly from "mycode". Yielding this:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 238671
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT 937.306
Using Perl:
C:\> ./mycode file | perl -lpe "($x,$y)=split; $x=~tr/0123/ACGT/; $_=qq{$x\t$y}" AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 238671 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACA 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACC 1548.81 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACG 0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT 937.306
You can use single quotes in Bash:
$ ./mycode file | perl -lpe '($x,$y)=split; $x=~tr/0123/ACGT/; $_="$x\t$y"'
As @ysth notes in the comments, perl
actually provides the command line options -a
and -F
:
-a autosplit mode with -n or -p (splits $_ into @F) ... -F/pattern/ split() pattern for -a switch (//'s are optional)
Using those:
perl -lawnF'\t' -e '$,="\t"; $F[0] =~ y/0123/ACGT/; print @F'
It should be possible to do it with sed, put this in a file (you can do it command-line to, with -e, just don't forget those semicolons, or use separate -e for each line). (EDIT: Keep in mind, since your data is tab delimited, it should in fact be a tab character, not a space, in the first s//, make sure your editor doesn't turn it into spaces)
#!/usr/bin/sed -f
h
s/ .*$//
y/0123/ACGT/
G
s/\n[0-3]*//
and use
./mycode somefile | sed -f sedfile
or chmod 755 sedfile
and do
./mycode somefile | sedfile
The steps performed are:
- copy buffer to hold space (replacing held content from previous line, if any)
- remove trailing stuff (from first space to end of line)
- transliterate
- append contents from hold space
- remove the newline (from the append step) and all digits following it (up to the space)
Worked for me on your data at least.
EDIT:
Ah, you wanted a one-liner...
GNU sed
sed -e "h;s/ .*$//;y/0123/ACGT/;G;s/\n[0-3]*//"
or old-school sed (no semicolons)
sed -e h -e "s/ .*$//" -e "y/0123/ACGT/" -e G -e "s/\n[0-3]*//"
@sarathi
\AWK solution for this
awk '{gsub("0","A",$1);gsub("1","C",$1);gsub("2","G",$1);gsub("3","T",$1); print $1"\t"$2}' temp.txt
精彩评论