开发者

Splitting comma separated string, ignore commas in quotes, but allow strings with one double quotation

I have searched through several posts on stackoverflow on how to split a string on comma delimiter, but ignore splitting on comma in quotes (see: How do I split a string into an array by comma but ignore commas inside double quotes?) I am trying to achieve just similar results, but need to also allow for a string that contains one double quote.

IE. Need "test05, \"test, 05\", test\", test 05" to splits into

  • test05
  • "test, 05"
  • test"
  • test 05

I tried a similar method to one mentioned here:

Regex for splitting a string using space when not surrounded by single or double quotes

Using Matcher, instead of split(). however, that specific examples it splits on spaces, and not on commas. I've tried to adjust the pattern to account for commas, instead, but have not had any luck.

String str = "test05, \"test, 05\", test\", test 05";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|([^,]+?)),++").matcher(str);

for (int i 开发者_StackOverflow中文版= 0; i < len; i++)
{
    m.region(i, len);

    if (m.lookingAt())
    {
        String s = m.group(1);

        if ((s.startsWith("\"") && s.endsWith("\"")))
        {
            s = s.substring(1, s.length() - 1);
        }

        System.out.println(i + ": \"" + s + "\"");
        i += (m.group(0).length() - 1);
    }
}


You have reached the point where regular expressions break down.

I would recommend that you write a simple splitter instead which handles your special cases as you wish. Test Driven Development is great for doing this.

It looks, however, like you are trying to parse CSV lines. Have you considered using a CSV-library for this?


I've had similar issues with this, and I've found no good .net solution so went DIY.

In my application I'm parsing a csv so my split credential is ",". this method I suppose only works for where you have a single char split argument.

So, I've written a function that ignores commas within double quotes. it does it by converting the input string into a character array and parsing char by char

public static string[] Splitter_IgnoreQuotes(string stringToSplit)
    {   
        char[] CharsOfData = stringToSplit.ToCharArray();
        //enter your expected array size here or alloc.
        string[] dataArray = new string[37];
        int arrayIndex = 0;
        bool DoubleQuotesJustSeen = false;          
        foreach (char theChar in CharsOfData)
        {
            //did we just see double quotes, and no command? dont split then. you could make ',' a variable for your split parameters I'm working with a csv.
            if ((theChar != ',' || DoubleQuotesJustSeen) && theChar != '"')
            {
                dataArray[arrayIndex] = dataArray[arrayIndex] + theChar;
            }
            else if (theChar == '"')
            {
                if (DoubleQuotesJustSeen)
                {
                    DoubleQuotesJustSeen = false;
                }
                else
                {
                    DoubleQuotesJustSeen = true;
                }
            }
            else if (theChar == ',' && !DoubleQuotesJustSeen)
            {
                arrayIndex++;
            }
        }
        return dataArray;
    }

This function, to my application taste also ignores ("") in any input as these are unneeded and present in my input.


Unless you really need to DIY, you should consider the Apache Commons class org.apache.commons.csv.CSVParser

http://commons.apache.org/sandbox/csv/apidocs/org/apache/commons/csv/CSVParser.html


Split against this pattern:

(?<=\"?),(?!\")|(?<!\"),(?=\")

so it will be:

String[] splitArray = subjectString.split("(?<=\"?),(?!\")|(?<!\"),(?=\")");

UPD: according to recent changes in question logic, it's better not to use naked split, you should firstly separated text in comma from non-in-commas text, then make simple split(",") on the last one. Just use simple for loop and check how many quotes you've met, simultaneously saving characters you've read into a StringBuffer. At first you saving your characters into StringBuffer, until you met quotes, then you put your StringBuffer into array containing Strings that wasn't in quotes. Then you make new StringBuffer and saving next characters you read into it, after you've met second comma, you've stopping and putting your new StringBuffer into array containing strings that were in commas. Repeating until the end of the string. So you will have 2 arrays, one with Strings that were in commas, others with strings not in commas. Then you should split all elements of the second array.


Try this:

import java.util.regex.*;

public class Main {
  public static void main(String[] args) throws Exception {

    String text = "test05, \"test, 05\", test\", test 05";

    Pattern p = Pattern.compile(
        "(?x)          # enable comments                                      \n" +
        "(\"[^\"]*\")  # quoted data, and store in group #1                   \n" +
        "|             # OR                                                   \n" +
        "([^,]+)       # one or more chars other than ',', and store it in #2 \n" +
        "|             # OR                                                   \n" +
        "\\s*,\\s*     # a ',' optionally surrounded by space-chars           \n"
    );

    Matcher m = p.matcher(text);

    while (m.find()) {
      // get the match
      String matched = m.group().trim();

      // only print the match if it's group #1 or #2
      if(m.group(1) != null || m.group(2) != null) {
        System.out.println(matched);
      }
    }
  }
}

For test05, "test, 05", test", test 05 it produces:

test05
"test, 05"
test"
test 05

and for test05, "test 05", test", test 05 it produces:

test05
"test 05"
test"
test 05
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜