Regex question: Match sequence only n times on a random place

2023-02-03 16:25 问答作者：

I have a regex quest开发者_高级运维ion, take for example:

...AAABZBZBCCCDDD...
...BZBZBDDDBZBZBCCC...

I am looking for a regular expression that matches BZBZB just n times.

in a line. So, if I wanted to match the sequence only once, I should only get the first line as output.

The string occurs on random places in the text. And the regex should be compatible with grep or egrep...

Thanks in advance.

grep '$.*BZBZB$\{5\}' will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.

However, you can do this in two grep commands:

cat temp.txt | grep '$.*BZBZB$\{5\}' | grep -v '$.*BZBZB$\{6\}'

will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)

From the grep man page:

   -m NUM, --max-count=NUM
    Stop  reading  a file after NUM matching lines.  If the input is
    standard input from a regular file, and NUM matching  lines  are
    output,  grep  ensures  that the standard input is positioned to
    just after the last matching line before exiting, regardless  of
    the  presence of trailing context lines.  This enables a calling
    process to resume a search.  When grep stops after NUM  matching
    lines,  it  outputs  any trailing context lines.  When the -c or
    --count option is also  used,  grep  does  not  output  a  count
    greater  than NUM.  When the -v or --invert-match option is also
    used, grep stops after outputting NUM non-matching lines.

So we need two grep expressions:

grep -e "BZ" -o
grep -e "BZ" -m n

The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.

>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ

Hopefully that is what you needed.

Its ugly but if the grep can do look ahead assertions, this should work:

/^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/

Edit - The {5} above is the n times variable in the OP. Looks like GNU grep does Perl like assertions using the -P option.

Perl sample

use strict;  
use warnings;  

my @strary = (  
  'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done',  
  'BZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB BZBZB  BZBZB',  
  'BZBZBBZBZBBZBZBBZBZBBZBZB 1',  
  'BZBZBZBBZBZBBZBZBBZBZBBZBZBBZBZB 2',  
);  

my @result = grep /^(((?!BZBZB).)*BZBZB){5}((?!BZBZB).)*$/,  @strary;  

for (@result) {  
   print "Found: '$_'\n";  
}

Output

Found: 'this is BZBZB BZBZB BZBZB and 4 BZBZB then 5 BZBZB and done'
Found: 'BZBZBBZBZBBZBZBBZBZBBZBZB 1'

继续阅读：grep regex

Regex question: Match sequence only n times on a random place

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？