开发者

Change XML data in place using Ruby and Regular Expressions

Here's an example of the data:

<animals><name>George</name><description>A big brown fox.</description></animals>

It really doesn't get more complicated than that. I want to modify all text in the elements. (In this case, encrypt it).

What I've come up with so far is:

xml_data.gsub(/(<.*>)(.+)(<\/.*>)(?=<)/, "#{$1}#{$2.encrypt_string}#{$3}")

But, it only replaces the last element's text. So I'm obviously missing something.

I invite any suggestions (including using REXML). I mus开发者_Go百科t use libraries standard with Ruby 1.8.7.

There is no chance of the XML being malformed because I wrote the process that produces it.

Thank you in advance!


Don't use regular expressions for this, use a real parser such as Nokogiri:

s = '<animals><name>George</name><description>A big brown fox.</description></animals>'
d = Nokogiri::XML(s)
d.search('//text()').each { |n| n.content = n.content.encrypt_string }
s2 = d.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::NO_DECLARATION).strip

Assuming of course that you have monkey patched encrypt_string into String somewhere.

As far as your regex goes, (.+) is greedy and will happily consume </close_tag>, you have similar problems with .*.


If you must use a regex (and it seems that you have choice), then you should tighten up your regex and switch to the block form of gsub to get sensible $1 and $2:

xml_data.gsub(/<([^>]+)>([^<]+)<\/\1>/) { "<#{$1}>#{$2.encrypt_string}</#{$1}>" }

Using [^>]+ and [^<]+ keeps you within the tags you want and the \1 back-reference is an easy to way match the opening and closing tags. For example, using upcase in place of encrypt_string does this:

>> s = '<animals><name>George</name><description>A big brown fox.</description></animals>'
>> s.gsub(/<([^>]+)>([^<]+)<\/\1>/) { "<#{$1}>#{$2.upcase}</#{$1}>" }
=> "<animals><name>GEORGE</name><description>A BIG BROWN FOX.</description></animals>"


.* matches as many characters as possible. "animals>< name>George< /name>< description"

Better to use <[^>]+>.

Edit Had to change what .* matches. (wrong format when pasting xml tags...)


Solution with REXML. Given xml_path is a valid path to an xml file

require 'rexml/document'
include REXML

xml_file = File.open(xml_path, 'r')
xml_data = Document.new(xml_file)

XPath.each(xml_data, "//*") do |element| 
  if element.text
    element.text = element.text.encrypt_string
  end
end

encrypted_xml_file = File.new("path/to/new/file", 'w')

encrypted_xml_file << xml_data

xml_file.close
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜