开发者

replacing doublequotes in csv

I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:

1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B

As you can see there are some " written for inch and "L" in the enclosing ".

Now I'm looking for a UNIX shell script to replace the " (inch) and "L" double quotes with 2 single quotes, like the following example:

sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $Q开发者_运维技巧FILE

Can anyone help me?


Update (Using perl it easy since you get full lookahead features)

perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file

Output

1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B

Using sed, grep only

Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:

grep -o '[^;]*' file | sed  's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g' 

Output - for your input file it gives:

1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
  • grep -o is basically splitting the input by ;
  • sed first replaces " at start of line by `
  • then it replaces " at end of line by another `
  • it then replaces all remaining double quotes " by single quite '
  • finally it puts back all " at the start and end


Maybe this is what you want:

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"

I.e.: Find double quotes (") following a number ([0-9]) but not followed by a semicolon ([^;]) and replace it with two single quotes.

Edit: I can extend my command (it's becoming quite long now):

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"

As you are using SunOS I guess you cannot use extended regular expressions (sed -r)? Therefore I did it that way: The first s command replaces all inch " with '', the second and the third s are the same. They substitute all " that are not a direct neighbor of a ; with a single '. I have to do it twice to be able to substitute the second " of e.g. "L" because there's only one character between both " and this character is already matched by \([^;]\). This way you would also substitute "" with ''. If you have """ or """" etc. you have to put one more (but only one more) s.


For the "L" try this:

 sed "s/\"L\"/'L'/g"

For inches you can try:

sed "s/\([0-9]\)\"\"/\1''\"/g" 

I am not sure it is the best option, but I have tried and it works. I hope this is helpful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜