开发者

A regex to match between delimiters except when there is a colon that is not between double quotes?

This one is a little bit complicated and I'm not sure if it can be done.

The regex need to match everything between a , (comma) or [] (square brackets). It must not match if there is 开发者_StackOverflowa : And now the tricky part. If the : is between " " it can match.

I managed to create a regex that does everything except the last. (?<=[[,])[^:]+?(?=[],])

So this is what it needs to match.

[ ItemName:Data, More Data, With a number "as: " item name]

I'm going to keep testing. Lets see if someone solves it.


It sounds like you're trying to specify a language that's really to complicated to parse using only regular expressions. Here's a pattern that matches what you've described, but probably won't work perfectly. It doesn't use look behinds so you need to select the first match group to get the contents.

/[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]/
/[\[,]   # Opening bracket or comma.
 (("[^"\]]*" # Anything not including the closing bracket, in quotes...
  |[^:\[]    # or not including the colon...
 ))*?        # repeated any number of times.
 [\]\,]/x # Closing bracket or comma.

An example usage in Python:

import re

pattern = re.compile(r"""[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]""", re.DEBUG)

for match in pattern.finditer('[1 2 3] [4 5] [6 : 7], "8 : 9", '):
    print match.group(1)

Producing output:

1 2 3
4 5
 "8 : 9"


I have good experience in using (perl) regexps in practise, so let me share my experience. If you are handling complex cases like this it is almost always best to do it step by step, unless you are in special ciscumstances (for example speed of execution is crucial).

So in this case I woud simply do it in two steps. First explode the data to chunks, i.e. something like (depending on your language)

split(/[][,]/)

and than accept or remove individual parts. In this case just remove parts which match this expression

/^([^"]*:.*|.*:[^"])$/

i.e. parts which include semicolon not surrounded with parantheses.

Clearly this deos not solve all the cases like With a number "as: " : "item" name, but I agree with Jeremy, than if you are trying to implement complicated syntax language, than it might not be the right thing to just throw few regexpes on it without deeper analysis (i.e. answering what exactly it should accept in wierd cases like [ 1:1, 2":"2,3":":3,4":":":"4,5":":"5], ...) and using appropriate aprroach to solve it (recursive syntax parser)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜