How do I modify a text field with awk?
I want to dele开发者_高级运维te text after the first colon :
(inclusive), or replace it with nothing.
For example, 1:5:30
should be changed into 1
. I prefer an awk script to perform this job. But I do not know how to do that. Could you please give me any directions? Thanks in advance.
My data are tab-delimited, and the file looks like there are points in some cells.
1 313 . T C 30.11 1:5:30 . . .
1 316 . A T 30.80 1:5:30 . 0:8:28 .
1 317 . T A 31.40 1:5:36 . 0:8:28 .
I tried the following, but I failed with all of them:
sed 's/:*:*//g' mydatafile
sed 's/:[0-9]:[0-9]//g' mydatafile
Bit unclear what the desired output should be, but this is my interpretation, using sed
:
$ sed 's/:.*//' input
1 313 . T C 30.11 1
1 316 . A T 30.80 1
1 317 . T A 31.40 1
Using awk:
$ awk -F":" '{print $1}' input
1 313 . T C 30.11 1
1 316 . A T 30.80 1
1 317 . T A 31.40 1
Using Cut:
cut -d":" -f1 input
Using bash:
IFS=':'
while read a b; do
echo $a
done < input
Alternative interpretation using awk:
$ awk 'BEGIN {OFS="\t"} {sub(/:.*/,"",$7); print}' input
1 313 . T C 30.11 1 . . .
1 316 . A T 30.80 1 . 0:8:28 .
1 317 . T A 31.40 1 . 0:8:28 .
3rd and hopefully last update
3rd interpretation using awk:
$ awk 'BEGIN {OFS="\t"} {for (i=1;i<NF;i++){sub(/:.*/,"",$i)}; print}' input
1 313 . T C 30.11 1 . . .
1 316 . A T 30.80 1 . 0 .
1 317 . T A 31.40 1 . 0 .
perl -p -e 's/:\d+:\d+//g' mydatafile
Try this:
sed 's/\([0-9][0-9]*\):[0-9][0-9]*:[0-9][0-9]*/\1/g' infile
or
sed 's/\([0-9]\{1,\}\):[0-9]\{1,\}:[0-9]\{1,\}/\1/g' infile
Output:
1 313 . T C 30.11 1 . . .
1 316 . A T 30.80 1 . 0 .
1 317 . T A 31.40 1 . 0 .
Here's the shortest one using sed
:
sed -i.orig 's/\([0-9]\)*:[^ ]*/\1/g' inputfile
This keep a copy of the orig file as inputfile.orig
. And replaces the file in-place.
This should do the trick.
$ sed -e 's/:.*//' mydatafile
1 313 . T C 30.11 1
1 316 . A T 30.80 1
1 317 . T A 31.40 1
I think sed is a little easier than awk for this problem.
Overview of Regular Expression Syntax
Later . . .
I see from your comments to other answers that you want to replace each occurrence of x:y:z
with x
. In that case, I'd use this awk program.
$ cat test.awk
BEGIN {
FS = "\t";
}
{
for (i = 1; i <= NF; i++) {
if (match($i, /:.*/)) {
$i = substr($i, 1, RSTART - 1);
}
printf("%s\t", $i);
}
printf("\n");
}
$ awk -f test.awk test.dat
1 313 . T C 30.11 1 . . .
1 316 . A T 30.80 1 . 0 .
1 317 . T A 31.40 1 . 0 .
精彩评论