开发者

Tilde not recognised in XML public identifier

I found an interesting bug and wanted to know you think. Brief background: I've written a custom DTD and an example XML file (both UTF-8). I have now implemented a SAX parser in Java which I want to test. I开发者_开发问答 got a SAXException complaining "An invalid XML character (Unicode: 0x7e) was found in the public identifier". Now, the URL of my DTD does contain a tilde character (unicode 0x7e). If I move the DTD file to another URL which does not contain a tilde, then my example XML file parses without causing a SAXException.

So I have a work-around for this problem, but I am interested to know: why does this happen? Is this a bug? If so, is it with UTF-8, Java (1.6.0_18 x86), Windows (Server 2008 R2 x86_64) or what? Or is this one of those little obscure nuances of the XML 1.0 specification?


You wouldn't normally put a URI (containing ~ or not) in the public identifier. The system identifier is the one that's commonly a URI.

I suspect you're saying:

<!DOCTYPE PUBLIC "http://www.example.com/~foo/x.dtd">

when you mean:

<!DOCTYPE SYSTEM "http://www.example.com/~foo/x.dtd">


It's an obscure nuance of the XML 1.0 specification. I like the phrase!

I believe "production 13" in Extensible Markup Language (XML) 1.0 (Fifth Edition)

[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

defines the character set allowed here.

Now that I've seen T.J. Crowder's comment, I'm unsure if this answer is correct. The section he cited does not seem to reference this rule.

This spec is indeed difficult to untangle.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜