开发者

A regular expression to clean up XML

I have to deal with XML data that sometimes contains the unescaped ampersand and I can't get the producer to either escape it to & or put it into a CDATA section.

Now I'm looking for a regular expression to replace & with & amp; if its not part of an entity. Something like this: &(?!(amp|apos|quot|lt|gt);)

Unfortunately, my programming environment only support "extended POSIX 1003.2 regular expressions" (see http://www.kernel.org/doc/man-pages/online/pages/man7/regex.7.html) which seem to lack the not operator "!" needed here.

Any ideas how to craft the necessary regular expre开发者_如何转开发ssion ?


Lateral thinking: Replace all & with &amp then replace all &apos (etc) with &apos (for example)? You can use a group to capture the part to be put back - &(apos)


Instead of searching for something matching a negative regex you could search for something NOT matching a positive regex, something like:

! ... &(?(amp|apos|quot|lt|gt);)

I did no read the whole page you linked, but am pretty sure it should be possible.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜