开发者

Unexpected exception while validating XML code

In .NET/C#, I want to validate some html code. For instance I have the following HTML :

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head><title></title></head>
  <body>
   CDATA section number 1?
  </body>
</html>

I have the following C# code:开发者_如何学运维

string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
    throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
    try
    {
        while (vrdr.Read()) { }
    }
    catch (XmlException ex)
    {
        // do some stuff
    }
}

when I run this code I have this exception:

System.Net.WebException : The remote server returned an error: (403) Forbidden.

at System.Net.HttpWebRequest.GetResponse()

What's wrong in what I've done? Thanks in advance for your help


It's not your code.

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

You need to supply the DTD yourself, for instance by using a custom XmlResolver which returns the DTD from a local resource.


It looks like your code is trying to download from http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

which returns a 403 (try opening it in your browser)

Note: Lucero's link has the explanation as to why it returns 403


The response code you're getting is an HTTP code stating that you are forbidden access to the resource you're trying to retrieve. This could be for a number of reasons:

  1. Server settings - The server may disallow ALL attempts to access the resource. To check for this, try accessing it from a browser. If you get the same error in the browser, then it's likely that your issue is the server configuration.

  2. Blocked user agent - Sometimes only certain user agents are allowed to access a resource. This is done to prevent automated website crawlers from scraping the info in the resource. If the site you're accessing has a robots.txt file there's a chance that your program is being blocked.

  3. Authentication needed - If the server you're accessing requires authentication (such as basic or digest auth) then you need to provide credentials along with your request. Again, this can be checked w/ the browser. If the resource required authentication you should get a popup in the browser requesting user/pass info.

There are probably other reasons you could be getting this code, but these are the first three I could think of off the top of my head.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜