开发者

sed or awk - deleting strings between patterns

I have a CSV file with lines like this:

AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC.DDD,C-name,num1,num2,num3
EEE.FFF.GGGG,E-name,num1,num2,num3    
HHH.H-name,num1,num2,num3
...

Some lines have one identifier (like AAA); some have two (like CCC); some have three or more (like EEE). And some identifiers are not three characters. I need to remove all but the first identifier from each line of 开发者_运维技巧the line (such that the first period and anything that comes after it is deleted until the first comma is encountered), producing this:

AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH,H-name,num1,num2,num3
...

I've tried a few pattern-replace methods but am getting tripped up. Does anyone have the syntax I need?


sed 's/^\([^.]\{1,\}\)[^,]*/\1/'


Just remove everything between a dot and the first colon. For the file

$ cat foo
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC.DDD,C-name,num1,num2,num3
EEE.FFF.GGGG,E-name,num1,num2,num3    
HHH.H-name,num1,num2,num3

use this sed command:

$ sed 's/\.[^,]*//' foo
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3    
HHH,num1,num2,num3

However, it will remove an H at the last line. This seems to be a typo in your example, however.


Using perl

$ perl -pe 's/\.[A-Z.]*?,/,/' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3

sed

$ sed 's/\.[A-Z.]*,/,/' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3

and awk

$ awk '/\./{sub(/\.[A-Z.]*,/, ",", $0)}{print}' input
AAA,A-name,num1,num2,num3
BBB,B-name,num1,num2,num3
CCC,C-name,num1,num2,num3
EEE,E-name,num1,num2,num3
HHH.H-name,num1,num2,num3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜