开发者

Need help with a Regex for parsing human typed times

I'm really new to Regex and working hard, but this has gone beyond simple in my opinion. I understand how to create the Regex object in .Net but I'm not sure how to use it for my specific purpose once I have a pattern.

Regex regex = new Regex("(at ){0,1}[0-9]{1,2}(:[0-9]{2}){0,1}(?:[ap]m?){0,1}");

I need to be able to take a sentence like "Dinner will be at 9pm at your favorite restaurant" and get the values { "Dinner will be at your favorite restaurant", "9pm " } (removing "at " if it exists).

Complete(?) test cases:

"Dinner at 9pm"            { "Dinner", "9pm" }
"Dinner at9pm"             { "Dinner", "9pm" }
"Dinner 9pm"               { "Dinner", "9pm" }
"Dinner 9p"                { "Dinner", "9pm" }
"Dinner 9a"                { "Dinner", "9am" }
"Dinner 9pZ"               { "Dinner 9pZ", "" }
"Dinner 9aZ"               { "Dinner 9aZ", "" }
"Dinner at 9"              { "Dinner", "9" }
"Dinner at 9:15pm"         { "Dinner", "9:15pm" }
"Dinner at 9:15"           { "Dinner", "9:15" }
"Dinner at9:15"            { "Dinner", "9:15" }
"Dinner at 9pm in Seattle" { "Dinner in Seattle", "9pm" }
"Dinner at9pmin Seattle"   { "Dinner in Seattle", "9pm" }
"Dinner at9in Seattle"     { "Dinner in Seattle", "9" }
"Dinner 9in Seattle"       { "Dinner 9in Seattle", "" }
"9pm Dinner"               { "Dinner", "9pm" }
"The 9pm Dinner was good"  { "The Dinner as good", "9pm" }
"Dinner at 9pmpm"          { "Dinner pm" "9pm" }
"Dinner at 9:15pmpm"       { "Dinner pm" "9:15pm" }

(just for further clarification, a number without a ":" or "am/pm" must be preceded by "at" unless it is the first number listed. "am" and "pm" require either an ending in "M" or " ".)

Beyond the test cases, I don't understand the syntax needed to get开发者_如何学Go back the values I need using the regex object (list in the brackets above).


A regex for doing this would be complicated and it also wouldn't return the results in the expected order in cases such as "9pm Dinner". If you're willing to spend a little time, it might be simpler to write a basic recursive-descent parser. Each word in the input would form a token, and you can easily come up with rules based on your requirements. For example:

event: "Dinner" time |
       "Dinner" location |
       "Dinner" time location |
       "Dinner" location time

time:  "at" number ":" number "am"/"pm"
       /* etc. */

You then write a small function for each non-terminal (event, time, location etc.) that will do its part and return the result.

As you see, your requirements already bring up so many possibilities that a regex would only make it extremely confusing, if at all possible.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜