开发者

XML validation using XSD in java

I have the following class:

package com.somedir.someotherdir;

import java.util.logging.Level;
import java.util.logging.Logger;

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

public class SchemaValidator
{
 private static Logger _logger = Logger.getLogger(SchemaValidator.class.getName());

 /**
  * @param file - the relative path to and the name of the XML file to be validated
  * @return true if validation succeeded, false otherwise
  */
 public final static boolean va开发者_JAVA技巧lidateXML(String file)
 {
  try
  {
   SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
   Schema schema = factory.newSchema();
   Validator validator = schema.newValidator();
   validator.validate(new StreamSource(file));
   return true;
  }
  catch (Exception e)
  {
   _logger.log(Level.WARNING, "SchemaValidator: failed validating " + file + ". Reason: " + e.getMessage(), e);
   return false;
  }
 }
}

I would like to know if I should use schema.newValidator("dir/to/schema.xsd") after all or is the current version alright? I read that there's some DoS vulnerability, maybe someone could provide more info on that? Also, does the path have to be absolute or relative?

Most of the XMLs to be validated each have their own XSD, so I'd like to read the schema that is mentioned in the XML itself (xs:noNamespaceSchemaLocation="schemaname.xsd").

The validation is done only during startup or manual reload (server software).


Are you really meaning XML DTD DOS attack? If so, there are some good articles on the net:

XML Denial of Service Attacks and Defenses http://msdn.microsoft.com/en-us/magazine/ee335713.aspx

From IBM developerWorks. "Tip: Configure SAX parsers for secure processing":

Entity resolution opens a number of potential security holes in XML.[...]
- The site where the external DTD is hosted can log the communication. [...]
- The site that hosts the DTD can slow the parsing [...] It can also stop the parse completely by serving a malformed DTD.
- If the remote site changes the DTD, it can use dafault attribute values to inject new content into the document[...] It can change the content of the document by redefining entity references.

Thought I am not sure that it can be directly applied to your program, it can give some clues for further investigation


As I interpret it, the javax.xml.validation.Schema object returned by SchemaFactory.newSchema() will try to fetch other schemas referred in the xml/xsd files to validate as indicated in the corresponding xsi:schemaLocation attributes. This implies that:

  1. If your schemas refer to schemas hosted in the internet, the Schema object will try to fetch them during runtime. As long as I'm aware, the default Schema implementation does not cache those schemas. The W3C already reported on bad coding practices resulting in de-facto DDoS to their website (up to 130M dtd requests per day!).
  2. If you are going to validate external uncontrolled xml files, then you are also exposed to the Schema trying to fetch other schemas from "possibly bad intended" xml sources.

For more evil attack vectors, take a look into sign's previous answer

To avoid this pitfall, you can store all external resources locally and use the SchemaFactory.setResourceResolver method to instruct the Schema how to fetch them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜