开发者

how can I tell xalan NOT to validate XML retrieved using the "document" function?

Yesterday Oracle decided to take down java.sun.com for a while. This screwed things up for me because xalan tried to validate some XML but couldn't retrieve the properties.dtd.

I'm using xalan 2.7.1 to run some XSL transforms, and I don't want it to validate anything. so tried loading up the XSL like this:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setName开发者_如何学GospaceAware(true);
spf.setValidating(false);
XMLReader rdr = spf.newSAXParser().getXMLReader();      
Source xsl = new SAXSource(rdr, new InputSource(xslFilePath));  
Templates cachedXSLT  = factory.newTemplates(xsl);
Transformer transformer = cachedXSLT.newTransformer();         
transformer.transform(xmlSource, result);  

in the XSL itself, I do something like this:

  <xsl:variable name="entry" select="document(concat($prefix, $locale_part, $suffix))/properties/entry[@key=$key]"/>

The XML this code retrieves has the following definition at the top:

<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="...

Despite the java code above instructing the parser to NOT VALIDATE, it still sends a request to java.sun.com. While java.sun.com is unavailable, this makes the transform fail with the message:

 Can not load requested doc: http://java.sun.com/dtd/properties.dtd

How do I get xalan to stop trying to validate the XML loaded from the "document" function?


The documentation mentions that the parser may read the DTDs even if not validating, as it may become necessary to use the DTD to resolve (expand) entities.

Since I don't have control over the XML documents, nont's option of modifying the XML was not available to me.

I managed to shut down attempts to pull in DTD documents by sabotaging the resolver, as follows.

My code uses a DocumentBuilder to return a Document (= DOM) but the XMLReader as per the OP's example also has a method setEntityResolver so the same technique should work with that.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false); // turns off validation
factory.setSchema(null);      // turns off use of schema
                              // but that's *still* not enough!
builder = factory.newDocumentBuilder();
builder.setEntityResolver(new NullEntityResolver()); // swap in a dummy resolver
return builder().parse(xmlFile); 

Here, now, is my fake resolver: It returns an empty InputStream no matter what's asked of it.

/** my resolver that doesn't */
private static class NullEntityResolver implements EntityResolver {

    public InputSource resolveEntity(String publicId, String systemId) 
    throws SAXException, IOException {
        // Message only for debugging / if you care
        System.out.println("I'm asked to resolve: " + publicId + " / " + systemId);
        return new InputSource(new ByteArrayInputStream(new byte[0]));
    }

}

Alternatively, your fake resolver could return streams of actual documents read as local resources or whatever.


Be aware that disabling DTD loading will cause parsing to fail if the DTD defines any entities that your XML file depends on. That said, to disable DTD loading try this, which assumes you're using the default Xerces that ships with Java.

    /*
     * Instantiate the SAXParser and set the features to prevent loading of an external DTD
     */
   SAXParser sp = SAXParserFactory.newInstance().newSAXParser();
   XMLReader xrdr = sp.getXMLReader();
   xrdr.setFeature("http://xml.org/sax/features/validation", false);
   xrdr.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

If you really need the DTD, then the other alternative is to implement a local XML catalog

    /*
     * Instantiate the SAXParser and add catalog support
     */
   SAXParser sp = SAXParserFactory.newInstance().newSAXParser();
   XMLReader xrdr = sp.getXMLReader();

   CatalogResolver cr = new CatalogResolver();
   xrdr.setEntityResolver(cr);

To which you will have to provide the appropriate DTDs and an XML catalog definition. This Wikipedia Article and this article were helpful.

CatalogResolver looks at the system property xml.catalog.files to determine what catalogs to load.


Try using setFeature on SAXParserFactory.

Try this:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setValidating(false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

I think that should be enough, otherwise try setting a few other features:

spf.setFeature("http://xml.org/sax/features/validation", false);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);


I just ended up stripping the doctype declaration out of the XML, because nothing else worked. When I get around to it, I'll try this: http://www.sagehill.net/docbookxsl/UseCatalog.html#UsingCatsXalan


Sorry for necroposting, but I have found a solution which actually works and decided I should share it.

1. For some reason, setValidating(false) doesn't work. In some cases, it still downloads external DTD files. To prevent this, you should attach a custom EntityResolver as advised here:

XMLReader rdr = spf.newSAXParser().getXMLReader();      
rdr.setEntityResolver(new MyCustomEntityResolver());

The EntityResolver will be called for every external entity request. Returning null will not work because the framework will still download the file from the Internet after that. Instead, you can return an empty stream which is a valid DTD, as advised here:

private class MyCustomEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicId, String systemId) {
        return new InputSource(new StringReader(""));
    }
}

2. You are telling setValidating(false) to the SAX parser which reads your XSLT code. That is, it will not validate your XSLT. When it encounters a document() function, it loads the linked XML file using another parser which still validates it, and also downloads external entities. To handle this, you should attach a custom URIResolver to the transformer:

Transformer transformer = cachedXSLT.newTransformer();
transformer.setURIResolver(new MyCustomURIResolver());

The transformer will call your URIResolver implementation when it encounters the document() function. Your implementation will have to return a Source for the passed URI. The simplest thing is to return a StreamSource as advised here. But in your case you should parse the document yourself, preventing validation and external requests using the customized SAXParser you already have (or create a new one each time).

private class MyCustomURIResolver implements URIResolver {
    public Source resolve(String href, String base) {
        return new SAXSource(rdr,new InputSource(href));
    }
}

So you will have to implement two custom interfaces in your code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜