replacing doublequotes in csv
I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:
1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B
As you can see there are some "
written for inch and "L"
in the enclosing "
.
Now I'm looking for a UNIX shell script to replace the "
(inch) and "L"
double quotes with 2 single quotes, like the following example:
sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $Q开发者_运维技巧FILE
Can anyone help me?
Update (Using perl it easy since you get full lookahead features)
perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file
Output
1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B
Using sed, grep only
Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:
grep -o '[^;]*' file | sed 's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g'
Output - for your input file it gives:
1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
grep -o
is basically splitting the input by;
- sed first replaces " at start of line by `
- then it replaces " at end of line by another `
- it then replaces all remaining double quotes
"
by single quite'
- finally it puts back all
"
at the start and end
Maybe this is what you want:
sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"
I.e.: Find double quotes ("
) following a number ([0-9]
) but not followed by a semicolon ([^;]
) and replace it with two single quotes.
Edit: I can extend my command (it's becoming quite long now):
sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"
As you are using SunOS I guess you cannot use extended regular expressions (sed -r
)? Therefore I did it that way: The first s
command replaces all inch "
with ''
, the second and the third s
are the same. They substitute all "
that are not a direct neighbor of a ;
with a single '
. I have to do it twice to be able to substitute the second "
of e.g. "L"
because there's only one character between both "
and this character is already matched by \([^;]\)
. This way you would also substitute ""
with ''
. If you have """
or """"
etc. you have to put one more (but only one more) s
.
For the "L" try this:
sed "s/\"L\"/'L'/g"
For inches you can try:
sed "s/\([0-9]\)\"\"/\1''\"/g"
I am not sure it is the best option, but I have tried and it works. I hope this is helpful.
精彩评论