Regular expression to find and replace unescaped Non-successive double quotes in CSV file
This is an extension to a related question answered Here
I have a weekly csv file which needs to be parsed. it looks like this.
"asdf","asdf","asdf","asdf"
But sometimes there are text fields which contain an extra unescaped double quote string like this
"asdf","as "something" df","asdf","asdf"
From the other posts on here, I was able to put together a regex
(?m)""(?![ \t]*(,|$))
which matches two successive double quotes, only "if they DON'T have a comma or end-of-the-line ahead o开发者_JAVA技巧f them with optionally spaces and tabs in between"
now this finds only double quotes in succession. How do i modify it to find and replace/delete the double quotes around "something" in the file?
thanks.
(?<!^|,)"(?!,|$)
will match a double quote that is not preceded or followed by a comma nor situated at start/end of line.
If you need to allow whitespace around the commas or at start/end-of-line, and if your regex flavor (which you didn't specify) allows arbitrary-length lookbehind (.NET does, for example), you can use
(?<!^\s*|,\s*)"(?!\s*,|\s*$)
I was using VIM to remove nested quotes in a .CSV file and this worked for me:
"[^,"][^"]*"[^,]
In vim I used this to remove all the unescaped quotes.
:%s/\v("(,")@!)&((",)@<!")&("(\n)@!)&(^@<!")//gc
detailed explanation is,
: - start the vim command
% - scope of the command is the whole file
s - search and replace
/ - start of search pattern
\v - simple regex syntax (rather than vim style)
(
" - double quote
(,") - comma_quote
@! - not followed by
)
& - and
(
(",) - quote_comma
@<!- does not precedes
" - double quote
)
& - and
(
" - double quote
(\n) - line end
@! - not followed by
)
& - and
(
^ - line beginning
@<! - does not precedes
" - double quote
)
/ - end of search pattern and start of replace pattern
- replace with nothing (delete)
/ - end of replace pattern
g - apply to all the matches
c - confirm with user for every replacement
this does the job fairly quickly. The only instance this fails is when there are instances of "," pattern in the data.
精彩评论