Replace whitespace outside quotes using regular expression
Using C#, I need to prepare a search text for searching in a SQL Server database using the LIKE command by replaci开发者_运维知识库ng all whitespace outside quotes with a % character. Example:
Input:
my "search text"
Output:
%my%search text%
Any help would be appreciated. I can handle input strings with an odd number of quotes before replacing the text.
Instead of using a RegEx
, use a simple state machine - loop over each character, noting whether you are "in" or "out" of quotes and only replace spaces when you are in the "out" state.
If you have to use a regex, you can do it if you are sure that all quotes are correctly balanced, and if there are no escaped quotes (\"
) in the string (it is possible to account for those, too, but it makes the regex even more complicated).
resultString = Regex.Replace(subjectString,
@"[\ ] # Match a space (brackets for legibility)
(?= # Assert that the string after the current position matches...
[^""]* # any non-quote characters
(?: # followed by...
""[^""]* # one quote, followed by 0+ non-quotes
""[^""]* # a second quote and 0+ non-quotes
)* # any number of times, ensuring an even number of quotes
$ # until the end of the string
) # End of lookahead",
"%", RegexOptions.IgnorePatternWhitespace);
This examines the remainder of the string to assert an even number of quotes after the current space character. The advantage of lookahead (thanks Alan Moore!) is that it's more portable than lookbehind (most regex flavors except .NET and a few others don't support indefinite repetition inside lookbehind assertions). It may also well be faster.
The original solution involving lookbehind is as follows:
resultString = Regex.Replace(subjectString,
@"(?<= # Assert that the string up to the current position matches...
^ # from the start of the string
[^""]* # any non-quote characters
(?: # followed by...
""[^""]* # one quote, followed by 0+ non-quotes
""[^""]* # a second quote and 0+ non-quotes
)* # any number of times, ensuring an even number of quotes
) # End of lookbehind
[ ] # Match a space (brackets for legibility)",
"%", RegexOptions.IgnorePatternWhitespace);
If the double quotes are not escaped in some fashion, then the following is another possibility. Possibly not as efficient as some methods (and certainly not as cool as Tim's regex), but it might be reasonably understandable when the next guy looks at the code. It splits the string on double quotes and then loops through the values. Odd entries are the parts outside of quotes, even entries are the ones inside quotes.
string value = "\"first\" some text \"other in quotes\" out of them \"in them\"";
string[] sets = value.Split('\"' );
StringBuilder newvalue = new StringBuilder("%");
for (int i = 0; i < sets.Length; i++) {
if ( i % 2 == 0 )
// even ones are outside quotes
newvalue.Append( sets[i].Replace( ' ', '%' ));
else
// and the odd ones are in quotes
newvalue.Append( "\"" + sets[i] + "\"" );
}
// final %
newvalue.Append("%");
It looks like you also want to remove the quotation marks and add a %
to the beginning and end of the search string. Try this:
string s0 = @"my ""search text""";
Regex re = new Regex(@"(?x)
(?:
(?<term>[^\s""]+)
|
""(?<term>[^""]+)""
)
(?:\s+|$)");
string s1 = @"%" + re.Replace(s0, @"${term}%");
Console.WriteLine(s1);
output:
%my%search text%
Would have done something like this:
private static string RemoveUnquotedWhiteSpaces(string text)
{
string result = String.Empty;
var parts = text.Split('"');
for(int i = 0; i < parts.Length; i++)
{
if (i % 2 == 0) result += Regex.Replace(parts[i], " ", "");
else result += String.Format("\"{0}\"", parts[i]);
}
return result
}
精彩评论