Unexpected exception while validating XML code

2023-01-23 07:44 问答作者：

In .NET/C#, I want to validate some html code. For instance I have the following HTML :

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head><title></title></head>
  <body>
   CDATA section number 1?
  </body>
</html>

I have the following C# code:开发者_如何学运维

string htmlCode = ... // for instance the html above
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += delegate(object s, ValidationEventArgs e)
{
    throw new XmlException(e.Message);
};
using (var srdr = new StringReader(htmlCode))
using (var xrdr = new XmlTextReader(srdr))
using (var vrdr = XmlReader.Create(xrdr, settings))
{
    try
    {
        while (vrdr.Read()) { }
    }
    catch (XmlException ex)
    {
        // do some stuff
    }
}

when I run this code I have this exception:

System.Net.WebException : The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()

What's wrong in what I've done? Thanks in advance for your help

It's not your code.

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

You need to supply the DTD yourself, for instance by using a custom XmlResolver which returns the DTD from a local resource.

It looks like your code is trying to download from http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

which returns a 403 (try opening it in your browser)

Note: Lucero's link has the explanation as to why it returns 403

The response code you're getting is an HTTP code stating that you are forbidden access to the resource you're trying to retrieve. This could be for a number of reasons:

Server settings - The server may disallow ALL attempts to access the resource. To check for this, try accessing it from a browser. If you get the same error in the browser, then it's likely that your issue is the server configuration.
Blocked user agent - Sometimes only certain user agents are allowed to access a resource. This is done to prevent automated website crawlers from scraping the info in the resource. If the site you're accessing has a robots.txt file there's a chance that your program is being blocked.
Authentication needed - If the server you're accessing requires authentication (such as basic or digest auth) then you need to provide credentials along with your request. Again, this can be checked w/ the browser. If the resource required authentication you should get a popup in the browser requesting user/pass info.

There are probably other reasons you could be getting this code, but these are the first three I could think of off the top of my head.

继续阅读：.net xml

Unexpected exception while validating XML code

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？