开发者

PHP regex that finds PHP errors

I want a PHP regex that can find errors on a page. So when I visit a site and crawl the page that I can list the errors on the site.

Currently I have the following code:

preg_match('/<b>.+<\/b>:.+ in <b>\/.+<\/b> on line <b>[0-9]+<\/b><br( \/)?>/msi',$html,$errors);

It can show if errors occurred, but it will not list them! I get the full html page in the array ($errors[0])

Could anybody help?

EDIT: So I have a page with for example the following HTML-source, from which I want to extract the PHP errors:

<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: The session id contains invalid characters, valid characters are only a-z, A-Z and 0-9 in <b>/home/.../public_html/articlescript/init.php</b> on line <b>127</b><br />
<br />
<b>Warning</b>:  session_start() [<a href='function.session-start'>function.session-start</a>]: Cannot send session cache limiter - headers already sent (output started at /home/.../public_html/articlescript/init.php:127) in <b>/home/.../public_ht开发者_JS百科ml/articlescript/init.php</b> on line <b>127</b><br />
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
    <title>...


Since – well, you know – you shouldn’t use regular expressions to parse HTML, try this using PHP’s DOM library:

libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($str);
$messages = array();
foreach ($doc->getElementsByTagName('b') as $elem) {
    if (in_array($elem->textContent, array('Error', 'Warning', 'Notice'))) {
        $buffer = $elem->textContent;
        while ($elem->nextSibling !== null && strtolower($elem->nextSibling->localName) !== 'br') {
            $elem = $elem->nextSibling;
            $buffer .= $elem->textContent;
        }
        $messages[] = $buffer;
    }
}

This will search for B elements that’s content is one of “Error”, “Warning”, or “Notice” and take the textual contents from there up to the next BR element. The initial call of libxml_use_internal_errors will prevent that parsing errors will be reported.


Forgive my language but it's quite foolish to attempt to parse HTML with regular expressions, especially potentially-malformed HTML. Use an HTML parsing library instead.

For HTML parsing and validation in HTML, I would refer to this answer; also check out the tidy extension.


Remember to escape your \ in strings.

preg_match_all('#<b>(.+?)</b>:(.+?) in <b>(.+?)</b> on line <b>([0-9]+)</b><br(?: /)?>#is',$string,$errors);

This code on ideone


Put brackets () around the bits of regex that you want to be stored in $errors.
You'll also want to use preg_match_all() rather then preg_match().


If this is your own website you can either: set the log levels and parse your log files (easier) or run your scripts from the command line with php -l.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜