开发者

Sed script to delete everything in <head> tag except the <title> and insert script

I want to remove everything inside the <head> tag except the <title> in an html file, 开发者_如何学运维and also insert a script into the <head> tag after this is done. I don't want to delete the <head> tag itself.

Is this possible using Sed?


Using regex to parse HTML is not a good choice. See this famous article for a full discussion


I will suggest you to use a DOM Parser for this type of work since any regex you try will break at some point using sed or any of its variant. Since you've asked for an alternative in your comments consider following code in PHP:

$content = '
<HTML>
<HEAD>
   <link href="/style.css" rel="stylesheet" type="text/css">
   <title>
   Page Title Goes here
   </title>
   <script>
       var str = "ZZZZZ1233@qq.edu";
   </script>    
</HEAD>
';
$dom = new DOMDocument();
$dom->loadHTML($content);
$head='
<head>
<script>
   // your javascript goes here
   var x="foo";
</script>
';
$headTag = $dom->getElementsByTagName("head")->item(0);
if ($headTag != null) {
   $title = $headTag->getElementsByTagName("title")->item(0);
   if ($title != null)
      $head .= '<title>' . $title->textContent . '</title>
';
}
$head .= '</head>';
var_dump($head);

OUTPUT

string(118) "
<head>
<script>
   // your javascript goes here
   var x="foo";
</script>
<title>Page Title Goes here</title>
</head>"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜