开发者

How can I create my own regex for "parse" HTML links?

The strings looks like hyperlinks, such as http://somethings. This is what I need :

  1. I need to check them only if they doesnt start with the character "; I mean, only that characters : if before there aren't characters it must check;
  2. That somethings string means that every kind of characters can be used (of course, is a link) except a whi开发者_运维问答tespace (The end marker link); I know, it's permitted by RFC, but is the only way I know to escape;
  3. these string are previously filtered by using htmlentities($str, ENT_QUOTES, "UTF-8"), that's why every kind of characters can be used. Is it secure? Or I risk problems with xss or html broked?
  4. the occurences of this replacement can me multiple, not only 1, and must be case insenstive;

This is my actual regex :

preg_replace('#\b[^"](((http|https|ftp)://).+)#', '<a class="lforum" href="$1">$1</a>', $str);

But it check only those string that START with ", and I want the opposite. Any helps answering to this question would be good, Thanks!


For both of your cases you'll want lookbehind assertions.

  1. \b(?<!")(\w)\b - negative lookbehind to match only if not preceded by "
  2. (?<=ThisShouldBePresent://)(.*) - positive lookbehind to match only if preceded by the your string.


  1. Something like this: preg_match('/\b[^"]/',$input_string);

    This looks for a word-break (\b), followed by any character other than a double quote ([^"]).

  2. Something like this: preg_match('~(((ThisShouldBePresent)://).+)~');

    I've assumed the brackets you specified in the question (and the plus sign) were intended as part of the regex rather than characters to search for.

    I've also taken @ThiefMaster's advice and changed the delimiter to ~ to avoid having to escape the //.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜