Python regular expressions assigning to named groups
When you use variables (is that the correct word?) in python regular expressions like this: "blah (?P\w+)" ("value" would be the variable), how could you make the variable's value be the text after "blah " to the end of the line or to a certain character not paying any attention to the actual content of the variable. For example, this is pseudo-code for what I want:
>>> import re
>>> p = re.compile("say (?P<value>continue_until_text_after_assignment_is_recognized) endsay")
>>> m = p.match("say Hello hi yo endsay")
>>> m.group('value')
'Hello hi yo'
Note: The title is probably not understan开发者_Go百科dable. That is because I didn't know how to say it. Sorry if I caused any confusion.
For that you'd want a regular expression of
"say (?P<value>.+) endsay"
The period matches any character, and the plus sign indicates that that should be repeated one or more times... so .+
means any sequence of one or more characters. When you put endsay
at the end, the regular expression engine will make sure that whatever it matches does in fact end with that string.
You need to specify what you want to match if the text is, for example,
say hello there and endsay but some more endsay
If you want to match the whole hello there and endsay but some more
substring, @David's answer is correct. Otherwise, to match just hello there and
, the pattern needs to be:
say (?P<value>.+?) endsay
with a question mark after the plus sign to make it non-greedy (by default it's greedy, gobbling up all it possibly can while allowing an overall match; non-greedy means it gobbles as little as possible, again while allowing an overall match).
精彩评论