开发者

Parse for square brackets with regular expressions

I've always had a difficult time with regular expressions. I've searched for help with this, but I can't quite find what I'm looking for.

I have blocks of text that follow this pattern:

[php] ... any type of code sample here [/php]

I need to:

  • check for the square brackets, which can contain any number of 20-30 programming language names (php, ruby, etc.).
  • need to grab all code in between the opening and closing bracket.

I have worked out the following regular expression:

#\[([a-z]+)\]([^\[/]*)\[/([a-z]+)\]#i

Which matches everything pretty well. However, it breaks when the code sample contains square brackets. How do I modify it so that any character开发者_StackOverflow社区 between those opening/closing braces will be matched for later use?


This is the regex you want. It matches where the tags are even too, so a php tag will only end a php tag.

/\[(\w+)\](.*?)\[\/\1\]/s

Or if you wanted to explicitly match the tags you could use...

$langs = array('php', 'python', ...); 

$langs = implode('|', array_map('preg_quote', $langs));

preg_match_all('/\[(' . $langs . ')\](.*?)\[\/\1\]/s', $str, $matches);


The following will work:

\[([a-z]+)\].*\[/\1\]

If you don't want to remove the greediness, you can do:

\[([a-z]+)\].*?\[/\1\]

All you have to do is to check that both the closing and opening tags have the same text (in this case, that both are the same programming language), and you do that with \1, telling it to match the previously matched Group number 1: ([a-z]+)


Why don't you use something like below:

\[php\].*?\[/php\]

I don't understand why you want to use [a-z]+ for the tags, there should be php or a limited amount of other tags. Just keep it simple.

Actually you can use:

\[(php)\].*?\[/(\1)\]

so that you can match the opening and closing tags. Otherwise you will be matching random opening and closing. Add others like, I don't know, js etc as php|js etc.


Use a backreference to refer to a match already made in the regular expression:

\[(\w+)\].*?\[/\1\]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜