开发者

Regex-How to remove comma which is between " and "?

How to remove ,(comma) which is between "(double inverted comma) and "(double inverted comma). Like there is "a","b","c","d,d","e","f" and then from this, between " and " there is one comma which should be removed and after removing that comma it should be "a","b","c","dd","e","f" with the help of the regex in C# ?

EDIT: I forgot to specify that there may be double comma between quotes like "a","b","c","d,d,d","e","f" for it that regex does not work. and there can be any number o开发者_StackOverflow社区f comma between quotes.

And there can be string like a,b,c,"d,d",e,f then there should be result like a,b,c,dd,e,f and if string like a,b,c,"d,d,d",e,f then result should be like a,b,c,ddd,e,f.


Assuming the input is as simple as your examples (i.e., not full-fledged CSV data), this should do it:

string input = @"a,b,c,""d,d,d"",e,f,""g,g"",h";
Console.WriteLine(input);

string result = Regex.Replace(input,
    @",(?=[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)",
    String.Empty);
Console.WriteLine(result);

output:

a,b,c,"d,d,d",e,f,"g,g",h
a,b,c,"ddd",e,f,"gg",h

The regex matches any comma that is followed by an odd number of quotation marks.


EDIT: If fields are quoted with apostrophes (') instead of quotation marks ("), the technique is exactly the same--except you don't have to escape the quotes:

string input = @"a,b,c,'d,d,d',e,f,'g,g',h";
Console.WriteLine(input);

string result = Regex.Replace(input,
    @",(?=[^']*'(?:[^']*'[^']*')*[^']*$)",
    String.Empty);
Console.WriteLine(result);

If some fields were quoted with apostrophes while others were quoted with quotation marks, a different approach would be needed.


EDIT: Probably should have mentioned this in the previous edit, but you can combine those two regexes into one regex that will handle either apostrophes or quotation marks (but not both):

@",(?=[^']*'(?:[^']*'[^']*')*[^']*$|[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)"

Actually, it will handle simple strings like 'a,a',"b,b". The problem is that there would be nothing to stop you from using one of the quote characters in a quoted field of the other type, like '9" Nails' (sic) or "Kelly's Heroes". That's taking us into full-fledged CSV territory (if not beyond), and we've already established that we're not going there. :D


They're called regular expressions for a reason — they are used to process strings that meet a very specific and academic definition for what is "regular". It looks like you have some fairly typical csv data here, and it happens that csv strings are outside of that specific definition: csv data is not formally "regular".

In spite of this, it can be possible to use regular expressions to handle csv data. However, to do so you must either use certain extensions to normal regular expressions to make them Turing complete, know certain constraints about your specific csv data that is not promised in the general case, or both. Either way, the expressions required to do this are unwieldly and difficult to manage. It's often just not a good idea, even when it's possible.

A much better (and usually faster) solution is to use a dedicated CSV parser. There are two good ones hosted at code project (FastCSV and Linq-to-CSV), there is one (actually several) built into the .Net Framework (Microsoft.VisualBasic.TextFieldParser), and I have one here on Stack Overflow. Any of these will perform better and just plain work better than a solution based on regular expressions.

Note here that I'm not arguing it can't be done. Most regular expression engines today have the necessary extensions to make this possible, and most people parsing csv data know enough about the data they're handling to constrain it appropriately. I am arguing that it's slower to execute, harder to implement, harder to maintain, and more error-prone compared to a dedicated parser alternative, which is likely built into whichever platform you're using, and is therefore not in your best interests.


var input = "\"a\",\"b\",\"c\",\"d,d\",\"e\",\"f\"";
var regex = new Regex("(\"\\w+),(\\w+\")");
var output = regex.Replace(input,"$1$2");
Console.WriteLine(output);

You'd need to evaluate whether or not \w is what you want to use.


You can use this:

var result = Regex.Replace(yourString, "([a-z]),", "$1");

Sorry, after seeing your edits, regular expressions are not appropriate for this.


This should be very simple using Regex.Replace and a callback:

string pattern = @"
""      # open quotes
[^""]*  # some not quotes
""      # closing quotes
";
data = Regex.Replace(data, pattern, m => m.Value.Replace(",", ""),
    RegexOptions.IgnorePatternWhitespace);

You can even make a slight modification to allow escaped quotes (here I have \", and the comments explain how to use "":

string pattern = @"
\\.     # escaped character (alternative is be """")
|
(?<Quotes>
    ""              # open quotes
    (?:\\.|[^""])*  # some not quotes or escaped characters
                      # the alternative is (?:""""|[^""])*
    ""              # closing quotes
)
";
data = Regex.Replace(data, pattern,
            m => m.Groups["Quotes"].Success ? m.Value.Replace(",", "") : m.Value,
            RegexOptions.IgnorePatternWhitespace);

If you need a single quote replace all "" in the pattern with a single '.


Something like the following, perhaps?

"(,)"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜