开发者

Nokogiri -- preserve doctype and meta tags

I'm using nokogiri to open an existing html file that looks like this:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
   <title>Foo</title> 
</head> 
<body>
<!-- stuff -->
</body>
</html>

Then I change the contents of the body tag like this:

html_file = Nokogiri::HTML("path/to/html/file")
html_file.css('body').first.inner_html = "new body content"

Then I write this new document to a file like this:

File.open("path/to/new/html/file", 'w') {|f| f.write html_file}

And this is my resulting html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html><body>
new body content
</body></html>

My question for you guys if it's possible to tell Nokogiri开发者_开发问答 to preserve the original html file's doctype and meta tags, since it appears like they are being lost/changed when I open the document with Nokogiri and attempt to write it to a file.

Any help would be much appreciated. Thanks!


Finally figured it out:

I just changed the line:

html_file = Nokogiri::HTML("path/to/html/file")

to

html_file = Nokogiri::HTML(File.open("path/to/html/file").read)

and now it works like I'm expecting it to. Seems kind of inconsistent, but I'm sure there's a good reason for it.

Thanks for all of the suggestions @ezkl!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜