PHP Remove JavaScript
I am trying to remove JavaScript from the HTML.
I c开发者_运维技巧an't get the regular expression to work with PHP; it's giving me an null array. Why?
<?php
$var = '
<script type="text/javascript">
function selectCode(a)
{
var e = a.parentNode.parentNode.getElementsByTagName(PRE)[0];
if (window.getSelection)
{
var s = window.getSelection();
if (s.setBaseAndExtent)
{
s.setBaseAndExtent(e, 0, e, e.innerText.length - 1);
}
else
{
var r = document.createRange();
r.selectNodeContents(e);
s.removeAllRanges();
s.addRange(r);
}
}
else if (document.getSelection)
{
var s = document.getSelection();
var r = document.createRange();
r.selectNodeContents(e);
s.removeAllRanges();
s.addRange(r);
}
else if (document.selection)
{
var r = document.body.createTextRange();
r.moveToElementText(e);
r.select();
}
}
</script>
';
function remove_javascript($java){
echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/i', "", $java);
}
?>
this should do it:
echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);
/s is so that the dot . matches newlines too.
Just a warning, you should not use this type of regexp to sanitize user input for a website. There is just too many ways to get around it. For sanitizing use something like the http://htmlpurifier.org/ library
This might do more than you want, but depending on your situation you might want to look at strip_tags
.
Here's an idea
while (true) {
if ($beginning = strpos($var,"<script")) {
$stringLength = (strpos($var,"</script>") + strlen("</script>")) - $beginning;
substr_replace($var, "", $beginning, $stringLength);
} else {
break
}
}
In your case you could regard the string as a list of newline delimited strings and remove the lines containing the script tags(first & second to last) and you wouldn't even need regular expressions.
Though if what you are trying to do is preventing XSS it might not be sufficient to only remove script tags.
function clean_jscode($script_str) {
$script_str = htmlspecialchars_decode($script_str);
$search_arr = array('<script', '</script>');
$script_str = str_ireplace($search_arr, $search_arr, $script_str);
$split_arr = explode('<script', $script_str);
$remove_jscode_arr = array();
foreach($split_arr as $key => $val) {
$newarr = explode('</script>', $split_arr[$key]);
$remove_jscode_arr[] = ($key == 0) ? $newarr[0] : $newarr[1];
}
return implode('', $remove_jscode_arr);
}
You can remove any JavaScript code from HTML string with the help of following PHP function
You can read more about it here: https://mradeveloper.com/blog/remove-javascript-from-html-with-php
function sanitizeInput($inputP)
{
$spaceDelimiter = "#BLANKSPACE#";
$newLineDelimiter = "#NEWLNE#";
$inputArray = [];
$minifiedSanitized = '';
$unMinifiedSanitized = '';
$sanitizedInput = [];
$returnData = [];
$returnType = "string";
if($inputP === null) return null;
if($inputP === false) return false;
if(is_array($inputP) && sizeof($inputP) <= 0) return [];
if(is_array($inputP))
{
$inputArray = $inputP;
$returnType = "array";
}
else
{
$inputArray[] = $inputP;
$returnType = "string";
}
foreach($inputArray as $input)
{
$minified = str_replace(" ",$spaceDelimiter,$input);
$minified = str_replace("\n",$newLineDelimiter,$minified);
//removing <script> tags
$minifiedSanitized = preg_replace("/[<][^<]*script.*[>].*[<].*[\/].*script*[>]/i","",$minified);
$unMinifiedSanitized = str_replace($spaceDelimiter," ",$minifiedSanitized);
$unMinifiedSanitized = str_replace($newLineDelimiter,"\n",$unMinifiedSanitized);
//removing inline js events
$unMinifiedSanitized = preg_replace("/([ ]on[a-zA-Z0-9_-]{1,}=\".*\")|([ ]on[a-zA-Z0-9_-]{1,}='.*')|([ ]on[a-zA-Z0-9_-]{1,}=.*[.].*)/","",$unMinifiedSanitized);
//removing inline js
$unMinifiedSanitized = preg_replace("/([ ]href.*=\".*javascript:.*\")|([ ]href.*='.*javascript:.*')|([ ]href.*=.*javascript:.*)/i","",$unMinifiedSanitized);
$sanitizedInput[] = $unMinifiedSanitized;
}
if($returnType == "string" && sizeof($sanitizedInput) > 0)
{
$returnData = $sanitizedInput[0];
}
else
{
$returnData = $sanitizedInput;
}
return $returnData;
}
this was very usefull for me. try this code.
while(($pos = stripos($content,"<script"))!==false){
$end_pos = stripos($content,"</script>");
$start = substr($content, 0, $pos);
$end = substr($content, $end_pos+strlen("</script>"));
$content = $start.$end;
}
$text = strip_tags($content);
I use this:
function clear_text($s) {
$do = true;
while ($do) {
$start = stripos($s,'<script');
$stop = stripos($s,'</script>');
if ((is_numeric($start))&&(is_numeric($stop))) {
$s = substr($s,0,$start).substr($s,($stop+strlen('</script>')));
} else {
$do = false;
}
}
return trim($s);
}
精彩评论