Replace content in a file between two markers
Using ruby (not rails), I'm trying to figure out how to replace (not append) a certai开发者_JS百科n block in a static file with a string. For example, in static_file.html I want to replace everything between the html comments "start" and "end":
<p>lorem ipsum blah blah ipsum</p>
<!--start-->
REPLACE MULTI-LINE
CONTENT HERE...
<!--end-->
<p>other stuff still here...</p>
Some of the answers here are helpful for inserting text at a certain spot, but does not handle between.
Here's a function to handle it for you. Just pass it a file path and the contents to replace in between those HTML comment blocks:
As long as your comment blocks are always formatted the same: <--start--> and <!--end-->, this will work.
def replace(file_path, contents)
file = File.open(file_path, "r+")
html = ""
while(!file.eof?)
html += file.readline
end
file.close()
return html.gsub(/<!--start-->(.*)<!--end-->/im, contents)
end
the simple answer would be:
str = "FOO\n\BAR\nblah \nblah BAZ\nBLOOP"
str.gsub(/BAR.*BAZ/m,"SEE")
I'm not sure if that's robust enough for what you are trying to do. The key here is the 'm' at the end of the regexp to indicate multi-line. If this is to template some values you may want to look at something like ERB templates instead of this gsub. Also, be careful on what you need to escape in your regular expressions.
This is a simplified example of how to do it using a parser:
require 'nokogiri'
html = '<p>lorem ipsum blah blah ipsum</p>
<!--start-->
REPLACE MULTI-LINE
CONTENT HERE...
<!--end-->
<p>other stuff still here...</p>'
doc = Nokogiri.HTML(html)
puts doc.to_html
After parsing we get:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>lorem ipsum blah blah ipsum</p>
# >>
# >> <!--start-->
# >> REPLACE MULTI-LINE
# >> CONTENT HERE...
# >> <!--end-->
# >>
# >> <p>other stuff still here...</p>
# >> </body></html>
doc.at('//comment()/following-sibling::text()').content = "\nhello world!\n"
puts doc.to_html
After finding the comment, stepping to the next text()
node and replacing it:
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>lorem ipsum blah blah ipsum</p>
# >>
# >> <!--start-->
# >> hello world!
# >> <!--end-->
# >>
# >> <p>other stuff still here...</p>
# >> </body></html>
If your HTML is always going to be simple, with no possibility of having strings that break your search patterns, then you can go with search/replace.
If you check around, you see that for any non-trivial HTML manipulation you should go with a parser. That's because they deal with the actual structure of the document, so if the document changes, there's a better chance of the parser not being confused.
精彩评论