开发者

How do I modify a text field with awk?

I want to dele开发者_高级运维te text after the first colon : (inclusive), or replace it with nothing.

For example, 1:5:30 should be changed into 1. I prefer an awk script to perform this job. But I do not know how to do that. Could you please give me any directions? Thanks in advance.

My data are tab-delimited, and the file looks like there are points in some cells.

1       313     .       T       C       30.11        1:5:30  .       .       .
1       316     .       A       T       30.80        1:5:30  .       0:8:28  .
1       317     .       T       A       31.40        1:5:36  .       0:8:28  .

I tried the following, but I failed with all of them:

sed 's/:*:*//g' mydatafile
sed 's/:[0-9]:[0-9]//g' mydatafile


Bit unclear what the desired output should be, but this is my interpretation, using sed:

$ sed 's/:.*//' input
1   313 .   T   C   30.11   1
1   316 .   A   T   30.80   1
1   317 .   T   A   31.40   1

Using awk:

$ awk -F":" '{print $1}' input
1   313 .   T   C   30.11   1
1   316 .   A   T   30.80   1
1   317 .   T   A   31.40   1

Using Cut:

cut -d":" -f1 input

Using bash:

IFS=':'

while read a b; do
    echo $a
done < input

Alternative interpretation using awk:

$ awk 'BEGIN {OFS="\t"} {sub(/:.*/,"",$7); print}' input
1   313 .   T   C   30.11   1   .   .   .
1   316 .   A   T   30.80   1   .   0:8:28  .
1   317 .   T   A   31.40   1   .   0:8:28  .

3rd and hopefully last update

3rd interpretation using awk:

$ awk 'BEGIN {OFS="\t"} {for (i=1;i<NF;i++){sub(/:.*/,"",$i)}; print}' input
1   313 .   T   C   30.11   1   .   .   .
1   316 .   A   T   30.80   1   .   0   .
1   317 .   T   A   31.40   1   .   0   .


perl -p -e 's/:\d+:\d+//g' mydatafile


Try this:

sed 's/\([0-9][0-9]*\):[0-9][0-9]*:[0-9][0-9]*/\1/g' infile

or

sed 's/\([0-9]\{1,\}\):[0-9]\{1,\}:[0-9]\{1,\}/\1/g' infile

Output:

1       313     .       T       C       30.11        1  .       .       .
1       316     .       A       T       30.80        1  .       0  .
1       317     .       T       A       31.40        1  .       0  .


Here's the shortest one using sed:

sed -i.orig 's/\([0-9]\)*:[^ ]*/\1/g' inputfile

This keep a copy of the orig file as inputfile.orig. And replaces the file in-place.


This should do the trick.

$ sed -e 's/:.*//' mydatafile
1       313     .       T       C       30.11        1
1       316     .       A       T       30.80        1
1       317     .       T       A       31.40        1

I think sed is a little easier than awk for this problem.

Overview of Regular Expression Syntax

Later . . .

I see from your comments to other answers that you want to replace each occurrence of x:y:z with x. In that case, I'd use this awk program.

$ cat test.awk
BEGIN {
  FS = "\t";
}
{
  for (i = 1; i <= NF; i++) {
    if (match($i, /:.*/)) {
        $i = substr($i, 1, RSTART - 1);
    }
    printf("%s\t", $i);
  }
  printf("\n");
}

$ awk -f test.awk test.dat
1       313     .       T       C       30.11   1       .       .       .
1       316     .       A       T       30.80   1       .       0       .
1       317     .       T       A       31.40   1       .       0       .
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜