开发者

RegEx: Match Non-Escaped Double Quoted Strings

I'm writing a Ruby code box with syntax highlighting (for 开发者_开发百科Ruby) written in php for my website, I can get it to color instance variables, comments, symbols and global variables so far but I have encountered a problem when using the following regex to match double quoted strings, here is my code:

<?php
    function codebox($code, $name="", $highlighted_line = -1)
    {   

        echo '<table class="code_table">';
        echo '<tr>';
        echo '<td class="code_table_header"></td>';
        echo '<td class="code_table_name">$name</td>';
        echo '<td class="code_table_header"><a href="" class="copy_to_clipboard_link">copy to clipboard</a></td>';
    echo '</tr>';

        $oddity = 'even';
        $line_number = 1;
        foreach(preg_split('/(\r?\n)/', $code) as $line)
        {
            echo '<tr>';
            if($line_number % 10 == 0)
            {
                echo '<td class="line_number" style="font-weight:bold;">' . $line_number . '</td>';
            } else {
                echo '<td class="line_number">' . $line_number . '</td>';
            }
            if($line_number == $highlighted_line)
            {
                echo '<td class="selected_code_cell" colspan="2">' . syntax_highlight($line) . '</td>';
            } else {
                echo '<td class="' . $oddity . '_code_cell" colspan="2">' . syntax_highlight($line) . '</td>';
            }
            echo '</tr>';
            $line_number += 1;
            if($oddity == 'even')
            {
                $oddity = 'odd';
            } else {
                $oddity = 'even';
            };
        };
    };
    function syntax_highlight($code)
    {
        // Make it so html doesn't bodge up
        $code = htmlentities($code);

        // Replace tabs with 4 none blocking spaces
        $code = str_replace('   ', '&nbsp;&nbsp;&nbsp;&nbsp;', $code);

        //instance variables
        $code = preg_replace('/\B(\@\w*\S)/', '<span style="color:lime;">$1</span>', $code);

        //global variables
        $code = preg_replace('/\B(\$\w*\S)/', '<span style="font-weight:bolder;color:#00b0f0;">$1</span>', $code);

        //symbols
        $code = preg_replace('/\B(\:\w*\S)/', '<span style="color:yellow;">$1</span>', $code);

        //strings (double quote)
        $code = preg_replace('/"(?:\.|(\\\")|[^\""\n])*"/', '<span style="font-style:italic;color:#FF5A00;">$1</span>', $code);

        //strings (single quote)
        //$code = preg_replace('/\'(?:\.|(\\\')|[^\'\'\n])*\'/', '<span style="font-style:italic;color:#FF5A00;">$1</span>', $code);

        return $code;
    };
?>

For some reason, the double quoted string breaks the other ones and no syntax highlighting is performed, does anyone know why? Thanks in advance, ell.


Don’t try to parse an irregular language like Ruby with regular expressions. Try to find a proper parser for Ruby instead that returns an array of the used language tokens.


What Gumbo said. You cannot make this work correctly with regular expressions alone. But you might try this:

 preg_match("/'([^'\n\\]|\\'|\\[^'])+'/", ...

Or maybe you have better luck with an assertion (?<![\\]) right before the quote.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜