开发者

split an html file keeping only the part above a delimiter in ruby

I have a html file which contains a string acting as a delimiter and I would like to - split this file and keep only the part which is above the delimiter - close all the opened html tags in the new file.

What would be the best way to do it with ruby (or unix) a开发者_如何学JAVAnd keep it efficient.

Thanks in advance Nicolas


If I understand your question correctly, what you want to do is store the part of the HTML file that lies before the delimiter, in a string, for example:

<html>
  <head>
    <title>Blah</title>
  </head>
  <body>
     <p>Some stuff</p>
        <!-- Delimiter --!>
  </body>
</html>

And you want everything before the <!-- Delimiter --!>

In which case you could probably do this:

str = "" 
File.open("the_file.html","r"){|f|str << f.read} #If you need to read the html out of a file
part_to_keep = str.split("<!-- Delimiter --!>").first

Let me know if this is what you needed.


For the Unix version you can use a perl one-liner as follows:

perl -n -e 'print if $delim;
            $delim=1 if ($delim or /<!-- Delimeter --!>/);' html_file >output

This works by using the sentinel variable $delim to detect whether the delimiter has been seen. All lines after the delimiter will then be printed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜