开发者

PHP regex optimize

I've got a regular expression that match everything between <anything> and I'm using this:

'@<([\w]+)>@'

today but I believe that there might be a better way to do it?开发者_运维百科

/ Tobias


\w doesn't match everything like you said, by the way, just [a-zA-Z0-9_]. Assuming you were using "everything" in a loose manner and \w is what you want, you don't need square brackets around the \w. Otherwise it's fine.


If "anything" is "anything except a > char", then you can:

@<([^>]+)>@

Testing will show if this performs better or worse.

Also, are you sure that you need to optimize? Does your original regex do what it should?


You better use PHP string functions for this task. It will be a lot faster and not too complex.

For example:

$string = "abcd<xyz>ab<c>d";

$curr_offset = 0;
$matches = array();

$opening_tag_pos = strpos($string, '<', $curr_offset);

while($opening_tag_pos !== false)
{
    $curr_offset = $opening_tag_pos;
    $closing_tag_pos = strpos($string, '>', $curr_offset);
    $matches[] = substr($string, $opening_tag_pos+1, ($closing_tag_pos-$opening_tag_pos-1));

    $curr_offset = $closing_tag_pos;
    $opening_tag_pos = strpos($string, '<', $curr_offset);
}

/*
     $matches = Array ( [0] => xyz [1] => c ) 
*/

Of course, if you are trying to parse HTML or XML, use a XHTML parser instead


That looks alright. What's not optimal about it?

You may also want to consider something other regex if you're trying to parse HTML: RegEx match open tags except XHTML self-contained tags

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜