Regex: How to exclude chararacters from a match?
I'm trying to parse the following string, similar to how google treats search operators:
type1:words in key1 type2:word in key2 type3:key3
To produce groups as key-value pairs, e.g.
type1 -> words in key1
type2 -> word in key2
type3 -> key3
This is what I've got so far, but the end of the match overlaps with the next pair, so I only get the first group.
开发者_开发技巧([\w\^]+):(.*?) \w+:
type1 -> words in key1
I have a feeling this should be done with backreferences, but my attempts so far have failed. What's the right approach?
(\w+):([^:]*)(?=\s\w|$)
works on all your sample data.
(\w+) # Match a keyword
: # Match :
([^:]*) # Match as many non-colon characters as possible
(?= # Lookahead assertion: backtrack to
\s # the closest space
| # or
$ # don't backtrack at all if we're at the end of the string
) # End of lookahead
Example Python program:
>>> import re
>>> r = re.compile(r"(\w+):([^:]*)(?=\s|$)")
>>> test = "type1:words in key1 type2:word in key2 type3:key3 type4:yet another key"
>>> for match in r.finditer(test):
... print("{} -> {}".format(match.group(1), match.group(2)))
...
type1 -> words in key1
type2 -> word in key2
type3 -> key3
type4 -> yet another key
To avoid eating the beginning of the next part, make the last \w+:
part of your regex non-consuming. This is called lookahead:
(?=re) matches re via zero-width positive lookahead (without consuming it)
So your regex should look like
([\w\^]+):(.*?) (?=\w+:|$)
It might be easier to split the input on the pattern
\s(?=\w+:\w)
Or, although it would reverse the order of the matches, you can evaluate from right to left and match
\w+:\w.*?
my try in php:
preg_match_all( '/([\w\^]+?):(.+?)\s?(?=\w+:|$)/', 'type1:words in key1 type2:word in key2 type3:key3', $matches );
var_dump( $matches );
results:
array(3) {
[0]=>
array(3) {
[0]=>
string(20) "type1:words in key1 "
[1]=>
string(19) "type2:word in key2 "
[2]=>
string(10) "type3:key3"
}
[1]=>
array(3) {
[0]=>
string(5) "type1"
[1]=>
string(5) "type2"
[2]=>
string(5) "type3"
}
[2]=>
array(3) {
[0]=>
string(13) "words in key1"
[1]=>
string(12) "word in key2"
[2]=>
string(4) "key3"
}
}
精彩评论