开发者

Match literal string

I have this web page where users can add smilies to their comments. And I want to limit the number of smilies per comment. The "system" works but I have some problems with the regex part. I have my smilies defined in a config file like so:

$config['Smilies'] = Array (
    // irrelevant stuff
    'smilies' => Array (
        ':)' => 'smile.gif',
        ':(' => 'sad.gif',
        // some more smilies
        's:10' => 'worship.gif',
        's:11' => 'zip.gif',
        's:12' => 'heart.gif',
        // some more smilies
        's:1' => 'dry.gif',
        's:2' => 'lol.gif',
        's:3' => 'lollol.gif',
        // some more smilies
    )
);

And then when I validate the comment (to see how many smilies are there), I loop trough this array and match the smile to the content of the comment. The regex is used like this:

foreach ( $this->config['smilies'] as $smilie => $smilieImage )
{
    $matches = Array ();
    Preg_Match_All ( '/' . Preg_Quote ( $smilie ) . '/i', $C开发者_StackOverflowontent, $matches );

    $numOfFoundSmilies += Count ( $matches[0] );
}

The problem is that the if I enter "s:10" into the comment, the above code will find two matches: "s:10" and "s:1". My knowledge of regular expressions is very poor, and I can't figure this one out.


Your code counts, for each smile code, how many times that code appears in the post, so 's:10' counts both as 's:10' and 's:1'.

A solution would be to look for all smile codes all at once, so that every piece of the post only counts towards a single smile code. This can be done by combining all codes into a single regex.

$codes = array_keys($smilie);
$escCodes = array_map('preg_quote', $codes);
$regex = '/'.implode('|',$escCodes).'/i';

preg_match_all($regex, $Content, $matches);

$found = count($matches);


Regular expressions are greedy by default (at least PCREs). Usually you could circumvent this:

/a+/ # selects the whiole string from "aaaaaaa"

/a+?/ # selects only "a"

In your case, this doesn't help much, since you can't just throw in a question mark somewhere. The only possibility is to re-order your search array and instantly replace the found places. Search first for s:10 and second for s:1, and use preg_replace() instead of the matching. This way, the second doesn't find the first anymore.

Another possibility: Split your search array in two. If you know, that the one always has the structure 's:' plus digits, you could have your regexp in this second loop like

Preg_Match_All ( '/' . Preg_Quote ( $smilie ) . '(?![0-9])/i', $Content, $matches );

with (?![0-9]) a look ahead expression looking for any non-digit.

And a third one: If you allow (== convert) smileys only at certain places, you could use this:

Preg_Match_All ( '/\b' . Preg_Quote ( $smilie ) . '\b/i', $Content, $matches );

\b is a "word boundary", usually any not-(letter, digit, underscore). Drawback is obviously, that not all smileys (like "abc;-)xyz") will be found.


I'd imagine this code to be faster than a Regex

$replaced = str_replace(array_keys($config['Smilies']), 
                        array_values($config['Smilies']),
                        $message, $count);

This would not solve the issues with s:1 and s:10 though, so I'd suggest to use a more clear delimiter/boundary notation for this, e.g. :s10: instead of s:10. Then it won't be an issue anymore.

In addition, I'd suggest not to use numeric identifiers for this anyway. User's will likely find it tedious to remember them. Why not use easy to memorize labels, e.g. :heart: or :lol:?


You could change your regexen to use word boundaries or \s (whitespace) to match, so s:1 becomes \bs:1\b or \ss:1\s. Beware that with the second method s:1. will not be matched, and both versions won't match This is my funny texts:1.


Change "s:1" to "s:1[^0-9]" - that matches any "s:1" not followed by another number.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜