How do I modify a text field with awk?

2023-03-12 03:54 问答作者：

I want to dele开发者_高级运维te text after the first colon : (inclusive), or replace it with nothing.

For example, 1:5:30 should be changed into 1. I prefer an awk script to perform this job. But I do not know how to do that. Could you please give me any directions? Thanks in advance.

My data are tab-delimited, and the file looks like there are points in some cells.

1       313     .       T       C       30.11        1:5:30  .       .       .
1       316     .       A       T       30.80        1:5:30  .       0:8:28  .
1       317     .       T       A       31.40        1:5:36  .       0:8:28  .

I tried the following, but I failed with all of them:

sed 's/:*:*//g' mydatafile
sed 's/:[0-9]:[0-9]//g' mydatafile

Bit unclear what the desired output should be, but this is my interpretation, using sed:

$ sed 's/:.*//' input
1   313 .   T   C   30.11   1
1   316 .   A   T   30.80   1
1   317 .   T   A   31.40   1

Using awk:

$ awk -F":" '{print $1}' input
1   313 .   T   C   30.11   1
1   316 .   A   T   30.80   1
1   317 .   T   A   31.40   1

Using Cut:

cut -d":" -f1 input

Using bash:

IFS=':'

while read a b; do
    echo $a
done < input

Alternative interpretation using awk:

$ awk 'BEGIN {OFS="\t"} {sub(/:.*/,"",$7); print}' input
1   313 .   T   C   30.11   1   .   .   .
1   316 .   A   T   30.80   1   .   0:8:28  .
1   317 .   T   A   31.40   1   .   0:8:28  .

3rd and hopefully last update

3rd interpretation using awk:

$ awk 'BEGIN {OFS="\t"} {for (i=1;i<NF;i++){sub(/:.*/,"",$i)}; print}' input
1   313 .   T   C   30.11   1   .   .   .
1   316 .   A   T   30.80   1   .   0   .
1   317 .   T   A   31.40   1   .   0   .

perl -p -e 's/:\d+:\d+//g' mydatafile

Try this:

sed 's/\([0-9][0-9]*\):[0-9][0-9]*:[0-9][0-9]*/\1/g' infile

sed 's/\([0-9]\{1,\}\):[0-9]\{1,\}:[0-9]\{1,\}/\1/g' infile

Output:

1       313     .       T       C       30.11        1  .       .       .
1       316     .       A       T       30.80        1  .       0  .
1       317     .       T       A       31.40        1  .       0  .

Here's the shortest one using sed:

sed -i.orig 's/\([0-9]\)*:[^ ]*/\1/g' inputfile

This keep a copy of the orig file as inputfile.orig. And replaces the file in-place.

This should do the trick.

$ sed -e 's/:.*//' mydatafile
1       313     .       T       C       30.11        1
1       316     .       A       T       30.80        1
1       317     .       T       A       31.40        1

I think sed is a little easier than awk for this problem.

Overview of Regular Expression Syntax

Later . . .

I see from your comments to other answers that you want to replace each occurrence of x:y:z with x. In that case, I'd use this awk program.

$ cat test.awk
BEGIN {
  FS = "\t";
}
{
  for (i = 1; i <= NF; i++) {
    if (match($i, /:.*/)) {
        $i = substr($i, 1, RSTART - 1);
    }
    printf("%s\t", $i);
  }
  printf("\n");
}

$ awk -f test.awk test.dat
1       313     .       T       C       30.11   1       .       .       .
1       316     .       A       T       30.80   1       .       0       .
1       317     .       T       A       31.40   1       .       0       .

继续阅读：perl sed

How do I modify a text field with awk?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？