开发者

RegEx: Count chars

I'm writing a PHP-Script which searches for particular headlines inside a dokuWiki-document.

My current pattern looks like this:

$pattern = "/.*=+ ". $header ." =+([^=]+)/m";
preg_match($pattern, $art->text, $m);
if (!empty($m[1])) {
   $art->text = $m[1];
} else {
   $art->text = "";
}

A sa开发者_开发技巧mple document:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

====== Header4 ======
Testtext4

When searching for TestHeader my result AS-IS is:

====== TestHeader ======
Testtext

I would wish that the pattern returns:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

Or in other words: I would like to match all headers which are surrounded by less = then the header I was searching for.

Is something like this possible with regular expressions?

Thanks in advance!


As I'm not a great PHP coder I don't know if there are any special PHP extensions to "normal" regexp's that allow for what you want. Other than that, regexps can't possibly solve your problem.

There is some information theory behind that, in case you are interested: regexps can only analyse so called "regular languages" (see the corresponding Wikipedia article). Without diving into theory too much, I can give you the intuition that regular expressions can't "count" things (at least not in the sense that they can compare two counts within the match). To restate the WP example: you can't find a string that has N a's followed by N b's no matter what N is.

Of course this is no mathematical proof that what you look for isn't possible, but it should give you a feeling about what regular expressions can and can't do. HTH


You can do it in a couple steps:

  • Use the code you've got to find the header you're looking for.
  • Count the ='s in that header.
  • Search for all headers with that many or fewer ='s

Suppose you knew you were looking for $n or fewer = characters in the header:

$pattern = "/.*={1,$n} ". $header ." ={1,$n}([^=]+)/m";

Although you'd have to use two regular expressions and do a little processing, it should be pretty quick, and the second regular expression would do exactly what you're asking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜