开发者

Regex on ;) smilie

A minor inconvenience my users have found is that if they use a smilie such as >_> at the end of parentheses (kind of like this: >_>) then during processing it is run through htmlspecialchars(), making it >_>) - you can see the problem, I think. The ;)开发者_运维百科 at the end is then replaced by the "Wink" smilie.

Can anyone give me a regex that will replace ;) with the smilie, but only if the ; is not the end of an HTML entity? (I'm sure it would involve a lookbehind but I can't seem to understand how to use them >_>)

Thank you!


Handling smileys like ;) is always a bit tricky - the way I would do it is transform it to the "canonical" :wink: before encoding HTML entities, and then changing only canonical-form :{smileyname}: smileys afterwards.


Like this: (?<!&[a-zA-Z0-9]+);\)

The (?>!...) is a zero-width assertion that will only allow the following construct to match text that isn't preceded by the ....


You should probably handle it along these lines, which sidesteps the issue of replacing replacements entirely:

  • Break the string apart wherever a smilie occurs, convert the smilies into tokens
  • HTML escape all the text nodes
  • Convert all the smilie tokens into their HTML tag equivalents
  • Glue everything back together

That's a bit non-trivial though. :)


Find: (&#?[a-z0-9]+;)\)
Replace: $0&#41;

We're looking for:

Match the regular expression below and capture its match into backreference number 1 «(&#?[a-z0-9]+;)»
   Match the character “&” literally «&»
   Match the character “#” literally «#?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match a single character present in the list below «[a-z0-9]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “a” and “z” «a-z»
      A character in the range between “0” and “9” «0-9»
   Match the character “;” literally «;»
Match the character “)” literally «\)»


Created with RegexBuddy


well if your intrested in a regex solution try this maybe

(?!t)([A-Za-z0-9]| );)


If it's in php (preg_replace you said ?), you can use preg_replace_callback :

preg_replace_callback('#(&[a-z0-9]+)?;\)#i', 'myFunction', 'myText');

in the "myFunction" function, you just have to check if you got some html entity in the capturing bracket.

function myFunction($matches) {
    if(!empty($matches[1]) {
        return $matches[0];
    }
    return '[Smilie]';
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜