C# Parsing Text Within Quotes
I'm developing a simple little search mechanism and I want to allow the user to search for chunks of text with spaces. For example, a user can search for the name of a person:
Name: John Smith
I then "John Smith".Split(' ')
into an array of two elements, {"John","Smith"}
. I then return all of the records that match "John" AND "Smith" first followed by records that match either "John" OR "Smith."
I then return no records for no matches. This isn't a complicated scenario and I have this part working.
I'd now like to be able to allow the user to ONLY return records that match "John Smith"
I'd like to use a basic quote syntax for searching. So if a user wants to search for "John Smith" OR Pocahontas they would enter: "John Smith" Pocahontas. The order of terms is absolutely irrelevant; "John Smith" does not receive priority over Pocahontas because he comes first in the list.
I have two main trains of thought on how I should parse the input.
A) Using regular expression then parsing stuff (IndexOf, Split)
B) Using only the parsing methods
I think a logical point of action would be to find the stuff in quotes; then remove it from the original string and insert it into a separate list. Then all the stuff left over from the original string could be split on the space and inserted into that separate list. If there is either 1 quote or an odd number, it is simply removed from the list.
How do I find matches the from within regex? I know about r开发者_StackOverflow中文版egex.Replace, but how would I iterate through the matches and insert them into a list. I know there is some neat way to do this using the MatchEvaluator delegate and linq, but I know basically nothing about regex in c#.
EDIT: Came back to this tab withou refreshing and didn't realize this question was already answered... accepted answer is better.
I think pulling out the stuff in quotes first with regex is a good idea. Maybe something like this:
String sampleInput = "\"John Smith\" Pocahontas Bambi \"Jane Doe\" Aladin";
//Create regex pattern
Regex regex = new Regex("\"([^\".]+)\"");
List<string> searches = new List<string>();
//Loop through all matches from regex
foreach (Match match in regex.Matches(sampleInput))
{
//add the match value for the 2nd group to the list
//(1st group is the entire match)
//(2nd group is the first parenthesis group in the defined regex pattern
// which in this case is the text inside the quotes)
searches.Add(match.Groups[1].Value);
}
//remove the matches from the input
sampleInput = regex.Replace(sampleInput, String.Empty);
//split the remaining input and add the result to our searches list
searches.AddRange(sampleInput.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries));
I needed the same functionality as Shawn but I didn't want to use regex. Here is a simple solution that I came up with uses Split() instead of regex for anyone else needing this functionality.
This works because the Split method, by default, will create empty entries in the array for consecutive search values in the source string. If we split on the quote character then the result is an array where the even indexed entries are individual words and the odd indexed entries will be the quotes phrases.
Example:
“John Smith” Pocahontas
Results in
item(0) = (empty string)
item(1) = John Smith
item(2) = Pocahontas
And
1 2 “3 4” 5 “6 7” “8 9”
Results in
item(0) = 1 2
item(1) = 3 4
item(2) = 5
item(3) = 6 7
item(4) = (empty string)
item(5) = 8 9
Note that an unmatched quote will result in a phrase from the last quote to the end of the input string.
public static List<string> QueryToTerms(string query)
{
List<string> Result = new List<string>();
// split on the quote token
string[] QuoteTerms = query.Split('"');
// switch to denote if the current loop is processing words or a phrase
bool WordTerms = true;
foreach (string Item in QuoteTerms)
{
if (!string.IsNullOrWhiteSpace(Item))
if (WordTerms)
{
// Item contains words. parse them and ignore empty entries.
string[] WTerms = Item.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string WTerm in WTerms)
Result.Add(WTerm);
}
else
// Item is a phrase.
Result.Add(Item);
// Alternate between words and phrases.
WordTerms = !WordTerms;
}
return Result;
}
Use a regex like this:
string input = "\"John Smith\" Pocahontas";
Regex rx = new Regex(@"(?<="")[^""]+(?="")|[^\s""]\S*");
for (Match match = rx.Match(input); match.Success; match = match.NextMatch()) {
// use match.Value here, it contains the string to be searched
}
精彩评论