How can I create my own regex for "parse" HTML links?

2023-03-03 18:15 问答作者：

The strings looks like hyperlinks, such as http://somethings. This is what I need :

I need to check them only if they doesnt start with the character "; I mean, only that characters : if before there aren't characters it must check;
That somethings string means that every kind of characters can be used (of course, is a link) except a whi开发者_运维问答tespace (The end marker link); I know, it's permitted by RFC, but is the only way I know to escape;
these string are previously filtered by using htmlentities($str, ENT_QUOTES, "UTF-8"), that's why every kind of characters can be used. Is it secure? Or I risk problems with xss or html broked?
the occurences of this replacement can me multiple, not only 1, and must be case insenstive;

This is my actual regex :

preg_replace('#\b[^"](((http|https|ftp)://).+)#', '<a class="lforum" href="$1">$1</a>', $str);

But it check only those string that START with ", and I want the opposite. Any helps answering to this question would be good, Thanks!

For both of your cases you'll want lookbehind assertions.

\b(?<!")(\w)\b - negative lookbehind to match only if not preceded by "
(?<=ThisShouldBePresent://)(.*) - positive lookbehind to match only if preceded by the your string.

Something like this: preg_match('/\b[^"]/',$input_string);

This looks for a word-break (\b), followed by any character other than a double quote ([^"]).
Something like this: preg_match('~(((ThisShouldBePresent)://).+)~');

I've assumed the brackets you specified in the question (and the plus sign) were intended as part of the regex rather than characters to search for.

I've also taken @ThiefMaster's advice and changed the delimiter to ~ to avoid having to escape the //.

继续阅读：php regex

精彩评论