How to match string with complex delimiters (regex in ruby)
I would like to match attribute pairs from string similiar to the one below
<tag_name attra="#{t("a.b.c")}" attrb="aa a">
... sould match on
attra="#{t("a.b.c")}" and开发者_StackOverflow attrb="aa a"
thanks in advance Marius
You could use lookaheads to detect if the quotes that are ending are part of the value or not, by looking if they are followed by a space or '>'
ruby-1.8.7-p248 > s='<tag_name attra="#{t("a.b.c")}" attrb="aa a">'
=> "<tag_name attra=\"\#{t(\"a.b.c\")}\" attrb=\"aa a\">"
ruby-1.8.7-p248 > s.scan /\w+=".*?"(?=\s|>)/
=> ["attra=\"\#{t(\"a.b.c\")}\"", "attrb=\"aa a\""]
Of course that won't work if you have a quote followed by a space or a '>' in your attribute value, so no matter how you look at it its a losing battle unless you skip those quotes inside the attribute values or preprocess them somehow. That's the reason why every language's string and regex have delimiters be skipped or preprocessed if they're found inside of the delimited value.
If there were no quotation marks in the attribute values (like attrb="aa a"
) or if the quotation marks were escaped as entities (like attrib=""Hello," he said"
) then it would be really easy to do with a regex along the lines of
/\w+="[^"]*"/
However, since you're really trying to match attra="#{t("a.b.c")}"
which is part of some Ruby code that generates XML (and which is not itself valid XML), even an XML parser (such as REXML or Nokogiri) won't solve this problem for you. You'll need your own context free parser, or you'll need to user the ripper library that's part of the Ruby 1.9.1 standard library to parse the parts of the attribute that are interpolated Ruby code, and then use some clever hack (like replacing the interpolated ruby code with a special character string) to match around the attribute value.
精彩评论