Change XML data in place using Ruby and Regular Expressions
Here's an example of the data:
<animals><name>George</name><description>A big brown fox.</description></animals>
It really doesn't get more complicated than that. I want to modify all text in the elements. (In this case, encrypt it).
What I've come up with so far is:
xml_data.gsub(/(<.*>)(.+)(<\/.*>)(?=<)/, "#{$1}#{$2.encrypt_string}#{$3}")
But, it only replaces the last element's text. So I'm obviously missing something.
I invite any suggestions (including using REXML). I mus开发者_Go百科t use libraries standard with Ruby 1.8.7.
There is no chance of the XML being malformed because I wrote the process that produces it.
Thank you in advance!
Don't use regular expressions for this, use a real parser such as Nokogiri:
s = '<animals><name>George</name><description>A big brown fox.</description></animals>'
d = Nokogiri::XML(s)
d.search('//text()').each { |n| n.content = n.content.encrypt_string }
s2 = d.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip
Assuming of course that you have monkey patched encrypt_string
into String somewhere.
As far as your regex goes, (.+)
is greedy and will happily consume </close_tag>
, you have similar problems with .*
.
If you must use a regex (and it seems that you have choice), then you should tighten up your regex and switch to the block form of gsub to get sensible $1
and $2
:
xml_data.gsub(/<([^>]+)>([^<]+)<\/\1>/) { "<#{$1}>#{$2.encrypt_string}</#{$1}>" }
Using [^>]+
and [^<]+
keeps you within the tags you want and the \1
back-reference is an easy to way match the opening and closing tags. For example, using upcase
in place of encrypt_string
does this:
>> s = '<animals><name>George</name><description>A big brown fox.</description></animals>'
>> s.gsub(/<([^>]+)>([^<]+)<\/\1>/) { "<#{$1}>#{$2.upcase}</#{$1}>" }
=> "<animals><name>GEORGE</name><description>A BIG BROWN FOX.</description></animals>"
.*
matches as many characters as possible. "animals>< name>George< /name>< description"
Better to use <[^>]+>
.
Edit Had to change what .* matches. (wrong format when pasting xml tags...)
Solution with REXML. Given xml_path
is a valid path to an xml file
require 'rexml/document'
include REXML
xml_file = File.open(xml_path, 'r')
xml_data = Document.new(xml_file)
XPath.each(xml_data, "//*") do |element|
if element.text
element.text = element.text.encrypt_string
end
end
encrypted_xml_file = File.new("path/to/new/file", 'w')
encrypted_xml_file << xml_data
xml_file.close
精彩评论