
Regex: How to exclude chararacters from a match?

I'm trying to parse the following string, similar to how google treats search operators:

type1:words in key1 type2:word in key2 type3:key3

To produce groups as key-value pairs, e.g.

type1 -> words in key1 
type2 -> word in key2 
type3 -> key3

This is what I've got so far, but the end of the match overlaps with the next pair, so I only get the first group.

([\w\^]+):(.*?) \w+: 

type1 -> words in key1 

I have a feeling this should be done with backreferences, but my attempts so far have failed. What's the right approach?


works on all your sample data.

(\w+)    # Match a keyword
:        # Match :
([^:]*)  # Match as many non-colon characters as possible
(?=      # Lookahead assertion: backtrack to
 \s      # the closest space
|        # or
 $       # don't backtrack at all if we're at the end of the string
)        # End of lookahead

Example Python program:

>>> import re
>>> r = re.compile(r"(\w+):([^:]*)(?=\s|$)")
>>> test = "type1:words in key1 type2:word in key2 type3:key3 type4:yet another key"
>>> for match in r.finditer(test):
...     print("{} -> {}".format(match.group(1), match.group(2)))
type1 -> words in key1
type2 -> word in key2
type3 -> key3
type4 -> yet another key

To avoid eating the beginning of the next part, make the last \w+: part of your regex non-consuming. This is called lookahead:

(?=re) matches re via zero-width positive lookahead (without consuming it)

So your regex should look like

([\w\^]+):(.*?) (?=\w+:|$)

It might be easier to split the input on the pattern


Or, although it would reverse the order of the matches, you can evaluate from right to left and match


my try in php:

preg_match_all( '/([\w\^]+?):(.+?)\s?(?=\w+:|$)/', 'type1:words in key1 type2:word in key2 type3:key3', $matches );
var_dump( $matches );


array(3) {
  array(3) {
    string(20) "type1:words in key1 "
    string(19) "type2:word in key2 "
    string(10) "type3:key3"
  array(3) {
    string(5) "type1"
    string(5) "type2"
    string(5) "type3"
  array(3) {
    string(13) "words in key1"
    string(12) "word in key2"
    string(4) "key3"




验证码 换一张
取 消

