开发者

Regex that ignores double-quoted sections

I'm trying to write a regular expression (via Autohotkey's RegExReplace function) that will enforce variable casing in exported VBA code as a preprocessing step to version control. So if I want all case-insensitive occurrences of 'firstName' to be changed to match that case then开发者_StackOverflow中文版 the following line:

If FirstName = "" Then MsgBox "Please enter FirstName"

would be translated into:

If firstName = "" Then MsgBox "Please enter FirstName"


If your tool/editor supports look aheads, you could try:

(?im)FirstName(?=([^"]*"[^"]*")*[^"]*$)

which means:

(?im)        # enable case insensitive matching, multi-line option
F            # match the character 'F' or 'f'
i            # match the character 'i' or 'I'
r            # match the character 'r' or 'R'
s            # match the character 's' or 'S'
t            # match the character 't' or 'T'
N            # match the character 'N' or 'n'
a            # match the character 'a' or 'A'
m            # match the character 'm' or 'M'
e            # match the character 'e' or 'E'
(?=          # start positive look ahead
  (          #   start capture group 1
    [^"]*    #     match any character except '"' and repeat it zero or more times
    "        #     match the character '"'
    [^"]*    #     match any character except '"' and repeat it zero or more times
    "        #     match the character '"'
  )*         #   end capture group 1 and repeat it zero or more times
  [^"]*      #   match any character except '"' and repeat it zero or more times
  $          #   match the end of a line
)            # end positive look ahead

In plain English: it matches the string 'FirstName' (case insensitive) only if it has zero, or an even number of double quotes ahead of it until the end of the line.

Note that it will fail if your line ends with a comment that has a quote in it!


Regexp isn't context sensitive, so it would be very hard to do this.


If you always expect FirstName to appear at the end of the quotations with the closing " after it, then you can use a negative look-around to not much such occurrences: FirstName(?!")

Otherwise, if you can't guarantee this placement of the closing quote, using a regex for this won't be ideal.

Alternately you could focus on the = sign and match anything that occurs before it. In this case a positive look-ahead comes in handy: FirstName(?=.*?=)


Regular expressions, on their own, do not do stuff, they accept strings. A regex like

[fF][iI][rR][sS][tT][nN][aA][mM][Ee]

will accept the string 'firstname' whatever cases are used. Then you write a replacement operation in your chosen language to replace the string recognised with 'firstName'. You may find that your chosen implementation of regular expressions has a case-insensitive matching capability which would simplify the regex.

The problem you have is in NOT modifying the case of FirstName when it is not in the right position in your expression -- ie how do you change the first occurrence of FirstName in your example, but not the second. In sed it's easy, as by default it only makes replacements the first time a regex is matched in a line. In VBA I haven't a clue.

Is your rule:

  • transform case for only the first match;
  • transform case only to the left of the first = sign in a string;
  • transform case only when match is not inside "";

?

If the third you may have problems if "" can be nested. Regexes can't really cope with arbitrarily-deep nesting of brackets (whatever character is used to bracket), though some implementations have ways around this limitation. However, if you find yourself trying to write a regex to match a string inside a particular number of matching brackets, you can be certain that you are using the wrong tool.

EDIT: in the 3rd case modify my regex to

.*[^"].*[fF][iI][rR][sS][tT][nN][aA][mM][Ee]

which should match any occurrence of firstname not preceded by a "

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜