removing special charecters if they are not part of the tag names
Can anyone help me? I am trying to edit html code using regular expressions.
Html code is something like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title></title>
<link href="css/style.css" rel="stylesheet"
type="text/css" media="screen" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
</head>
<body>
<div id="wrapper">
<div id="content">
<div class="textArea">
<div class="textLeft">
<h2>ökföäa äaf aäpig</h2>
<p> fkjafkhafkha</p>
<p>aklfjöl开发者_如何学Go ölafj aljföla</p>
</div>
<div class="textCenter">
<h2>rueueueu</h2>
<p>
eegeg eg<br />
eg "egsge"<br />
sgesgeg<br />
<a href="http://">gsgs sgsey</a>
</p>
</div>
</div>
</div>
</div>
</body>
</html>
I woud like to replace all the special charecters with entities nut not if they are part of the tag names.
For example in the quates woud not be replaced but in "egsge" they woud be.
How can i do this?
You could use the htmlentities function to encode your "special" characters to html entities.
However remember that your php code should create the HTML and though you should have full control when to encode your strings.
If you have all this html-code as a single string, say $string - try this one:
$string = preg_replace_callback('/>(.*)</Us',function($match){return '>'.htmlentities($match[1],ENT_QUOTES,'UTF-8').'<';},$string);
Please check the parameters for htmlentities and note that the use of anonymous functions is only available since PHP 5.3.0. If you are using an earlier version, you can simply write a named function to get a workaround for this.
精彩评论