Sed script to delete everything in <head> tag except the <title> and insert script
I want to remove everything inside the <head>
tag except the <title>
in an html file, 开发者_如何学运维and also insert a script into the <head>
tag after this is done. I don't want to delete the <head>
tag itself.
Is this possible using Sed?
Using regex to parse HTML is not a good choice. See this famous article for a full discussion
I will suggest you to use a DOM Parser for this type of work since any regex you try will break at some point using sed or any of its variant. Since you've asked for an alternative in your comments consider following code in PHP:
$content = '
<HTML>
<HEAD>
<link href="/style.css" rel="stylesheet" type="text/css">
<title>
Page Title Goes here
</title>
<script>
var str = "ZZZZZ1233@qq.edu";
</script>
</HEAD>
';
$dom = new DOMDocument();
$dom->loadHTML($content);
$head='
<head>
<script>
// your javascript goes here
var x="foo";
</script>
';
$headTag = $dom->getElementsByTagName("head")->item(0);
if ($headTag != null) {
$title = $headTag->getElementsByTagName("title")->item(0);
if ($title != null)
$head .= '<title>' . $title->textContent . '</title>
';
}
$head .= '</head>';
var_dump($head);
OUTPUT
string(118) "
<head>
<script>
// your javascript goes here
var x="foo";
</script>
<title>Page Title Goes here</title>
</head>"
精彩评论