Can anyone recommend a method to perform the following string operation using C#
Suppose I hav开发者_开发知识库e a string:
"my event happened in New York on Broadway in 1976"
I have many such strings, but the locations and dates vary. For example:
"my event happened in Boston on 2nd Street in 1998" "my event happened in Ann Arbor on Washtenaw in 1968"
so the general form is: "my event happened in X on Y in Z"
I would like to parse the string to extract X, Y and Z
I could use Split and use the sentinel words "in", "on" to delimit the token I want but this seems clunky. But using a full parser/lexer like grammatica seems heavyweight.
Recommendations would be gratefully accepted.
Is there a "simple" parser lexer for C#?
KISS applies here. Just do the String.Split
solution, or use String.IndexOf
to find the "in" and "out" (frankly, String.Split
is the simplest). You don't need anything more complicated for such a simple "grammar"; note in particular that regex is overkill here.
Try using regex pattern matching. Here's an MSDN link that should be pretty helpful: http://support.microsoft.com/kb/308252
An example might help. Note that a regex solution gives you scope to accept more variants as and when you see them. I reject the idea that RegEx is overkill, by the way. I'm no expert but it's so easy to do stuff like this I do wonder why it's not used more frequently.
var regEx = new Regex(
"(?<intro>.+) in (?<city>.+) on (?<locality>.+) in (?<eventDate>.+)"
);
var match = regEx.Match("My event happens in Baltimore on Main Street in 1876.");
if (!match.Success) return;
foreach (var group in new[] {"intro", "city", "locality", "eventDate"})
{
Console.WriteLine(group + ":" + match.Groups[group]);
}
Finally, if performance is a real worry (though ignore this if it isn't), look here for optimisation tips.
If you are sure that the string is always going to be in that format then you can do as you have already figured out by splitting by words "in" and then by "on".
To be sure you would like to then search for the Found words in a Database of City names and Year for Validity of search.
If string may not be in that format always then you can do is Search for the whole string for Words and match them against database of City names and Years and check them for Validity.
精彩评论