Unexpected exception while validating XML code
In .NET/C#, I want to validate some html code. For instance I have the following HTML :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head><title></title></head>
<body>
CDATA section number 1?
</body>
</html>
I have the following C# code:开发者_如何学运维
string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
try
{
while (vrdr.Read()) { }
}
catch (XmlException ex)
{
// do some stuff
}
}
when I run this code I have this exception:
System.Net.WebException : The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()
What's wrong in what I've done? Thanks in advance for your help
It's not your code.
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic
You need to supply the DTD yourself, for instance by using a custom XmlResolver
which returns the DTD from a local resource.
It looks like your code is trying to download from http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
which returns a 403 (try opening it in your browser)
Note: Lucero's link has the explanation as to why it returns 403
The response code you're getting is an HTTP code stating that you are forbidden access to the resource you're trying to retrieve. This could be for a number of reasons:
Server settings - The server may disallow ALL attempts to access the resource. To check for this, try accessing it from a browser. If you get the same error in the browser, then it's likely that your issue is the server configuration.
Blocked user agent - Sometimes only certain user agents are allowed to access a resource. This is done to prevent automated website crawlers from scraping the info in the resource. If the site you're accessing has a robots.txt file there's a chance that your program is being blocked.
Authentication needed - If the server you're accessing requires authentication (such as basic or digest auth) then you need to provide credentials along with your request. Again, this can be checked w/ the browser. If the resource required authentication you should get a popup in the browser requesting user/pass info.
There are probably other reasons you could be getting this code, but these are the first three I could think of off the top of my head.
精彩评论