C# - Regular Expression to split string on spaces, unless a double quote is encountered
This thread is very similar to what I want: Regular Expression to split on spaces unless in quotes
But I need a few extra rules that I cannot figure out: - the above thread does split on spaces, unless they're in double quotes. However, it splits on punctuation as well. I need Anything inside the double quotes to remain as one entity.
For exam开发者_如何学Cple:
/Update setting0 value="new value" /Save should return /Update setting0 value= new value (I don't care whether it trims the quotes off or not) /Save/Import "C:\path\file.xml" "C:\path_2\file_2.xml" /Exit should return
/Import C:\path\file.xml (I don't care whether it trims the quotes off or not) C:\path_2\file_2.xml /ExitI ended up using this expression from the thread above:
(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"
Could someone please help me tweak it? Thanks!
I haven't tried it in C# but VBA in Excel but it might be helpful. I have also changed double to single quotea. Anyway, here is the regexp
Text:
/Update setting0 value='new value' /Save
Regexp:
('{1}(\w|\s|:|\\|\.)+'{1}|\w)+
Result:
Update
setting0
value
'new value'
Save
Text:
/Import 'C:\path\file.xml' 'C:\path_2\file_2.xml' /Exit
Result:
Import
'C:\path\file.xml'
'C:\path_2\file_2.xml'
Exit
This is a problem that cannot in general be solved using regular expressions. Instead, you can write a simple parser which takes a line, reading each character, then when it encounters a space and not being inside a quote, it takes the current substring and adds it to a list:
public static string[] ParseLine(string line)
{
var insideQuotes = false;
var parts = new List<string>();
var j = 0;
for (var i = 0; i < line.Length; i++)
{
switch (line[i])
{
case '"':
insideQuotes = !insideQuotes;
break;
case ' ':
if (!insideQuotes)
{
parts.Add(line.Substring(j, i - j));
j = i + 1;
}
break;
default:
continue;
}
}
return parts.ToArray();
}
Note however that this won't handle like escaped quotes inside quotes.
This one works if there is even number of double quotes and no escaped quotes:
^
\s*
(?:
(?:
([^\s"]+)
|
"([^"]*)"
)
\s*
)+
$
var matches = Regex.Matches("/Update setting0 value=\"new value\" /Save", "\\G(?:(\"[^\"]*\"?|[^ \"]+)|[ ]+)");
foreach (Match match in matches) {
foreach (Capture capture in match.Groups[1].Captures) {
Console.WriteLine(capture);
}
}
If you want to not have the quotes (so "new value"
becomes new value
)
var matches = Regex.Matches("/Update setting0 value=\"new value\" /Save", "\\G(?:\"(?<1>[^\"]*)\"?|(?<1>[^ \"]+)|[ ]+)");
The ?
after the second \"
is to catch unclosed quotes.
Just my modified version of what eulerfx
posted. This one:
Should produce the results requested in the original question (so is "on topic").
Doesn't include quotes in the results
Doesn't include white-space only in results
Splits results on any white-space not inside quotes
Handles missing end-quote by just adding whatever is left-over after loop
Trims results, unless inside quotes.
I mostly made this for parsing the last 2 parts of each line of an IMAP list result.
public static string[] ParseLine(string line)
{
var insideQuotes = false;
var start = -1;
var parts = new List<string>();
for (var i = 0; i < line.Length; i++)
{
if (Char.IsWhiteSpace(line[i]))
{
if (!insideQuotes)
{
if (start != -1)
{
parts.Add(line.Substring(start, i - start));
start = -1;
}
}
}
else if (line[i] == '"')
{
if (start != -1)
{
parts.Add(line.Substring(start, i - start));
start = -1;
}
insideQuotes = !insideQuotes;
}
else
{
if (start == -1)
start = i;
}
}
if (start != -1)
parts.Add(line.Substring(start));
return parts.ToArray();
}
精彩评论