开发者

Getting text between quotes using regular expression

I'm having some issues with a regular expression I'm creating.

I need a regex to match against the following examples and then sub match on the first quoted string:

Input strings

("Lorem ipsum dolor sit amet, consectetur ad开发者_如何学运维ipiscing elit.")

('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ')

('Lorem ipsum dolor sit amet, consectetur adipiscing elit. ', 'arg1', "arg2")

Must sub match

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Regex so far:

\((["'])([^"']+)\1,?.*\)

The regex does a sub match on the text between the first set of quotes and returns the sub match displayed above.

This is almost working perfectly, but the problem I have is that if the quoted string contains quotes in the text the sub match stops at the first instance, see below:

Failing input strings

("Lorem ipsum dolor \"sit\" amet, consectetur adipiscing elit.")

Only sub matches: Lorem ipsum dolor

("Lorem ipsum dolor 'sit' amet, consectetur adipiscing elit.")

The entire match fails.

Notes

The input strings are actually php code function calls. I'm writing a script that will scan .php source files for a specific function and grab the text from the first parameter.


Try this regular expression:

\(\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*')(?:\s*,\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*'))*\s*\)

Some explanation:

  • \(\s\* matches the opening parenthesis and optional whitespace.
  • (?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*') is to match any quoted string allowing the quote character only when escaped with \.
  • (?:\s*,\s*(?:"(?:[^"\\]+|\\.)*"|'(?:[^'\\]+|\\.)*'))* describes zero or more quotes strings, preceded by a , that may be preceded and followed by whitespace.
  • \s*\) matches the closing parenthesis with optional whitespace.


make sure to not match a quote when it is escaped (has a backslash before it):

/\((["'])([^"']+)[^\\]\1,?.*?\)/
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜