开发者

Using Regular Expressions for Pattern Finding with Replace

I have a string in the following format in a comma delimi开发者_Python百科ted file:

someText, "Text with, delimiter", moreText, "Text Again"

What I need to do is create a method that will look through the string, and will replace any commas inside of quoted text with a dollar sign ($).

After the method, the string will be:

someText, "Text with$ delimiter", moreText, "Text Again"

I'm not entirely good with RegEx, but would like to know how I can use regular expressions to search for a pattern (finding a comma in between quotes), and then replace that comma with the dollar sign.


Personally, I'd avoid regexes here - assuming that there aren't nested quote marks, this is quite simple to write up as a for-loop, which I think will be more efficient:

var inQuotes = false;
var sb = new StringBuilder(someText.Length);

for (var i = 0; i < someText.Length; ++i)
{
    if (someText[i] == '"')
    {
        inQuotes = !inQuotes;
    }

    if (inQuotes && someText[i] == ',')
    {
        sb.Append('$');
    }
    else
    {
        sb.Append(someText[i]);
    }
}


This type of problem is where Regex fails, do this instead:

    var sb = new StringBuilder(str);

    var insideQuotes = false;

    for (var i = 0; i < sb.Length; i++)
    {
        switch (sb[i])
        {
            case '"':
                insideQuotes = !insideQuotes;
                break;
            case ',':
                if (insideQuotes)
                    sb.Replace(',', '$', i, 1);
                break;
        }               
    }

    str = sb.ToString();

You can also use a CSV parser to parse the string and write it again with replaced columns.


Here's how to do it with Regex.Replace:

        string output = Regex.Replace(
            input,
            "\".*?\"",
            m => m.ToString().Replace(',', '$'));

Of course, if you want to ignore escaped double quotes it gets more complicated. Especially when the escape character can itself be escaped.

Assuming the escape character is \, then when trying to match the double quotes, you'll want to match only quotation marks which are preceded by an even number of escape characters (including zero). The following pattern will do that for you:

string pattern = @"(?<=((^|[^\\])(\\\\){0,}))"".*?(?<=([^\\](\\\\){0,}))""";

A this point, you might prefer to abandon regular expressions ;)

UPDATE:

In reply to your comment, it is easy to make the operation configurable for different quotation marks, delimiters and placeholders.

        string quote = "\"";
        string delimiter = ",";
        string placeholder = "$";

        string output = Regex.Replace(
            input,
            quote + ".*?" + quote,
            m => m.ToString().Replace(delimiter, placeholder));


If you'd like to go the regex route here's what you're looking for:

var result = Regex.Replace( text, "(\"[^,]*),([^,]*\")", "$1$$$2" );

The problem with regex in this case is that it won't catch "this, has, two commas".

See it working at http://refiddle.com/1ab


Can you give this a try: "[\w ],[\w ]" (double quotes included)? And be careful with the replacement because direct replacement will remove the whole string enclosed in the double quotes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜