Regex question: Match sequence only n times on a random place
I have a regex quest开发者_高级运维ion, take for example:
- ...AAABZBZBCCCDDD...
- ...BZBZBDDDBZBZBCCC...
I am looking for a regular expression that matches BZBZB just n times.
in a line. So, if I wanted to match the sequence only once, I should only get the first line as output. The string occurs on random places in the text. And the regex should be compatible with grep or egrep... Thanks in advance.grep '\(.*BZBZB\)\{5\}'
will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.
However, you can do this in two grep commands:
cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'
will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)
From the grep man page:
-m NUM, --max-count=NUM Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are output, grep ensures that the standard input is positioned to just after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. When grep stops after NUM matching lines, it outputs any trailing context lines. When the -c or --count option is also used, grep does not output a count greater than NUM. When the -v or --invert-match option is also used, grep stops after outputting NUM non-matching lines.
So we need two grep expressions:
grep -e "BZ" -o
grep -e "BZ" -m n
The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.
>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ
Hopefully that is what you needed.
Its ugly but if the grep can do look ahead assertions, this should work:
/^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/
Edit - The {5} above is the n times variable in the OP. Looks like GNU grep does Perl like assertions using the -P option.
Perl sample
use strict;
use warnings;
my @strary = (
'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done',
'BZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB BZBZB BZBZB',
'BZBZBBZBZBBZBZBBZBZBBZBZB 1',
'BZBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB 2',
);
my @result = grep /^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/, @strary;
for (@result) {
print "Found: '$_'\n";
}
Output
Found: 'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done'
Found: 'BZBZBBZBZBBZBZBBZBZBBZBZB 1'
精彩评论