split an html file keeping only the part above a delimiter in ruby
I have a html file which contains a string acting as a delimiter and I would like to - split this file and keep only the part which is above the delimiter - close all the opened html tags in the new file.
What would be the best way to do it with ruby (or unix) a开发者_如何学JAVAnd keep it efficient.
Thanks in advance Nicolas
If I understand your question correctly, what you want to do is store the part of the HTML file that lies before the delimiter, in a string, for example:
<html>
<head>
<title>Blah</title>
</head>
<body>
<p>Some stuff</p>
<!-- Delimiter --!>
</body>
</html>
And you want everything before the <!-- Delimiter --!>
In which case you could probably do this:
str = ""
File.open("the_file.html","r"){|f|str << f.read} #If you need to read the html out of a file
part_to_keep = str.split("<!-- Delimiter --!>").first
Let me know if this is what you needed.
For the Unix version you can use a perl one-liner as follows:
perl -n -e 'print if $delim;
$delim=1 if ($delim or /<!-- Delimeter --!>/);' html_file >output
This works by using the sentinel variable $delim to detect whether the delimiter has been seen. All lines after the delimiter will then be printed.
精彩评论