开发者

Using regex in preg_replace to match an html href anchor tag

I'm trying to use preg_replace to replace

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

with

<a href="WWW.ANYURL.COM">DISPLAY_TEXT</a>

here is my code:

$string = htmlentities(mysql_real_escape_string($string1)); 
$newString = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$string);

If I do limited tests such as:

$newString = preg_replace('#&lt;a\ href#','TEST',$string);

then

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

becomes

TEST=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAYTEXT&lt;/a&gt;

But if I try to get it to also match开发者_StackOverflow中文版 the "=" it acts as if it could't find a match, i.e.

$newString = preg_replace('#&lt;a\ href=#','TEST',$string);

returns the original unchanged:

&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;

I've been going at this for a couple hours, any help would be greatly appreciated.

EDIT: code in context

$title = clean_input($_POST['title']);
$story = clean_input($_POST['story']);

function clean_input($string) 
  { 
  if(get_magic_quotes_gpc())
  {
   $string = stripslashes($string);
  }
$string = htmlentities(mysql_real_escape_string($string)); 
$findValues = array("&lt;b&gt;","&lt;/b&gt;");
$newValues = array("<b>", "</b>");
$newString = str_replace($findValues, $newValues, $string);
$newString2 = preg_replace('#&lt;a\ href=&quot;([^&]*)&quot;&gt;([^&]*)&lt;/a&gt;#','<a href="$1">$2</a>',$newString);
return $newString2;
}

Sample $story = Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href="www.google.com">Google</a> Vivamus quis sem felis. Morbi vitae neque ac neque blandit malesuada lobortis sit amet justo. Donec convallis, nibh ut lacinia tempor, neque felis scelerisque nibh, at feugiat lectus erat in nulla. In et euismod nunc. <pernicious code></code>Pellentesque vitae ante orci, vitae ultrices neque. <a href="www.yahoo.com">Yahoo</a> In non nulla sapien, vestibulum faucibus metus. Fusce egestas viverra arcu, <b>ac</b> sagittis leo facilisis in. Nulla facilisi.

I want only a few tags like href and bold to be allowed through as code.


You don't need to manually replace anything. If this is your whole input string, then use html_entity_decode() to turn the escapes back into < and >.


Again, your regex works as intended with the sample text.

Your problem is the premature mysql_real_escape_string() call. It adds backslashes to the " double quotes in your html, and that's why back-converting fails (your regex is not prepared for finding \&quot;).

Avoid that. Get rid of the ugly clean_string() hack and magic_quotes as advised by the manual. You must do the database escaping right before inserting into the database, not earlier. (Or better yet use the easier PDO with prepared statements.)

Also avoid the $newString123 variable duplicates, just overwrite the one you already have when rewriting strings.


You could also do it like this:

$str = "&lt;a href=&quot;WWW.ANYURL.COM&quot;&gt;DISPLAY_TEXT&lt;/a&gt;";
echo "Your html code is thus: " . htmlspecialchars_decode($str);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜