Regexp match any uppercase characters except a particular string
I want to match all lines that have any uppercase characters in them but ignoring the string A_
To add to the complication I want to ignore everything after a different string, e.g. an open comment
He开发者_运维技巧re are examples of what should and shouldnt match
Matches:
- fooBar
- foo Bar foo
- A_fooBar
- fooBar /* Comment */
Non Matches (C_ should not trigger a match)
- A_foobar
- foo A_bar
- foobar
- foo bar foo bar
- foobar /* Comment */
thanks :)
This should (also?) do it:
(?!A_)[A-Z](?!((?!/\*).)*\*/)
A short explanation:
(?!A_)[A-Z] # if no 'A_' can be seen, match any uppercase letter
(?! # start negative look ahead
((?!/\*).) # if no '/*' can be seen, match any character (except line breaks)
* # match zero or more of the previous match
\*/ # match '*/'
) # end negative look ahead
So, in plain English:
Match any uppercase except 'A_' and also not an uppercase if '*/' can be seen without first encountering '/*'.
My answer:
/([B-Z]|A[^_]|A$)/
I would remove the comment at an earlier stage, if at all possible.
Test:
#!perl
use warnings;
use strict;
my @matches = (
"fooBar",
"foo Bar foo",
"A_fooBar",
"fooBar /* Comment */");
my @nomatches = (
"A_foobar",
"foo A_bar",
"foobar",
"foo bar foo bar",
"foobar /* Comment */");
my $regex = qr/([B-Z]|A[^_]|A$)/;
for my $m (@matches) {
$m =~ s:/\*.*$::;
die "FAIL $m" unless $m =~ $regex;
}
for my $m (@nomatches) {
$m =~ s:/\*.*$::;
die "FAIL $m" unless $m !~ $regex;
}
Try it: http://codepad.org/EJhWtqkP
Try:
(?<!A_)[a-zA-Z]+
(?!...)
is called a negative lookbehind.
As for your specific problem, it's kind of cheating but try:
^([#\.]|(?<!A_))[A-Za-z]{2,}
I get:
fooBar => fooBar
foo Bar foo => foo
A_fooBar (no match)
fooBar /* Comment */ => fooBar
A_foobar (no match)
foo A_bar => foo
foobar => foobar
foo bar foo bar => foo
foobar /* Comment */ => foobar
Does it have to be a single regex? In perl, you could do something like:
if ($string =~ /[A-Z]/ && $string !~ /A_/)
Its not as cool as a single expression with lookback, but its probably easier to read and maintain.
This one does it, although the comment handling isn't extremely robust. (It assumes that a comment is always at the end of the line.)
.*((A(?!_)|([B-Z]))(?<!/\*.*)).*\r\n
Try this:
^(?:[^A-Z/]|A_|/(?!\*))*+[A-Z]
This will work in any flavor that supports possessive quantifiers, e.g. PowerGrep, Java and PHP. The .NET flavor doesn't, but it does support atomic groups:
^(?>(?:[^A-Z/]|A_|/(?!\*))*)[A-Z]
If neither of those features is available, you can use another lookahead to prevent it matching the A_
on the rebound:
^(?:[^A-Z/]|A_|/(?!\*))*(?!A_)[A-Z]
精彩评论