开发者

Parsing problematic XML in Querypath (dots in elements)

I am trying to parse an NewsML (http://www.iptc.org/std/NewsML-G2/2.7/examples/LISTING2_NewsML-G2_C开发者_运维百科omplete.xml) document with querypath. But I have trouble with the dots in some elements, like <body.head>.

In some firefox querypath plugins I am able to escape the dot with a backslash, but in the php pear library this does not work.

Any ideas?

(I am looking for solution within Querypath, not for workarounds)


In the past, I've used the Tidy PHP extension (http://us3.php.net/manual/en/book.tidy.php) to clean up HTML/XML before passing it into QueryPath.

The XML you referenced above is pretty clean, and also pretty small.

If the only issue is dots in element names, preprocessing with a regular expression would probably work, too. And it would be the fastest solution. I'm guessing you could do a preg_replace('/<body\./g', '<body-', $xml) and have it fixed. (That would replace body.content with body-content and so on.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜