How to make second test passing in these C# Xml unit tests? What do I miss in initialization of XmlReaderSettings?
Issue description:
- I need to fix an issue with resolving of standard HTML entitities.
- I've implemented HtmlEntityReader - implementation of XmlReader which has a code to resolve entities
- Public API of our system provides a methods with usage of XmlReader, so user can pass XmlReader created using one of the XmlReader.Create methods
Current code of my xml unit tests is below:
using System.Xml;
using NUnit.Framework;
namespace Tests
{
[TestFixture]
public class XmlTests
{
// this test works
[Test]
public void TestEntitiesResolving1()
{
var path = QA.ResolvePath(@"html\bugs\317.html");
using (var reader = new XmlTextReader(path, new NameTable()))
{
reader.XmlResolver = null; //to prevent DTD downloading
var wrapper = new HtmlEntityReader(reader, XmlUtils.HtmlEntities);
while (wrapper.Read()) { }
}
}
// this test does not work - why?
// what's the difference in initialization of internal XmlTextReaderImpl?
[Test]
public void TestEntitiesResolving2()
{
var path = QA.ResolvePath(@"html\bugs\317.html");
var settings = new XmlReaderSettings
{
XmlResolver = null, //to prevent DTD downloading
NameTable = new NameTable(),
ProhibitDtd = false,
CheckCharacters = false,
};
using (var reader = XmlReader.Create(path, settings))
{
var wrapper = new HtmlEntityReader(reader, XmlUtils.HtmlEntities);
while (wrapper.Read()) { }
}
}
}
}
Partial code of HtmlEntityReader is below:
internal sealed class HtmlEntityReader : XmlReader
{
readonly XmlReader _impl;
readonly Hashtable _entitySet;
string _entityValue;
public HtmlEntityReader(XmlReader reader, Hashtable entitySet)
{
if (reader == null) throw new ArgumentNullException("reader");
if (entitySet == null) throw new ArgumentNullException("entitySet");
_impl = reader;
_entitySet = entitySet;
}
public override XmlNodeType NodeType
{
get { return _entityValue != null ? XmlNodeType.Text : _impl.NodeType; }
}
public override string LocalName
{
get { return _entityValue != null ? string.Empty : _impl.LocalName; }
}
public override string Prefix
{
get { return _entityValue != null ? string.Empty : _impl.Prefix; }
}
public override string Name
{
get { return _entityValue != null ? string.Empty : _impl.Name; }
}
public override bool HasValue
{
get { return _entityValue != null || _impl.HasValue; }
}
public override string Value
{
get { return _entityValue ?? _impl.Value; }
}
public override bool CanResolveEntity
{
get { return true; }
}
public override void ResolveEntity()
{
//it seems this does not call - why?
}
public override bool Read()
{
_entityValue = null;
if (!_impl.Read()) return false;
if (NodeType == XmlNodeType.EntityReference)
{
//resolving of entity reference
_entityValue = (string)_entitySet[Name];
}
return true;
}
// ... delegation of XmlReader abstract methods to _impl
}
I've got the exception:
System.Xml.XmlException: Reference to undeclared entity 'nbsp'. Line 4, position 5. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String arg, Int32 lineNo, Int32 linePos) at System.Xml.XmlTextReaderImpl.HandleGeneralEntityReference(String name, Boolean isInAttributeValue, Boolean pushFakeEntityIfNullResolver, Int32 entityStartLinePos) at System.Xml.XmlTextReaderImpl.HandleEntityReference(Boolean isInAttributeVa开发者_开发知识库lue, EntityExpandType expandType, ref Int32 charRefEndPos) at System.Xml.XmlTextReaderImpl.ParseText(ref Int32 startPos, ref Int32 endPos, ref Int32 outOrChars) at System.Xml.XmlTextReaderImpl.ParseText() at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() ... private staff
Could you provide a quick advice or link to a solution while I am fixing / investigating / searching this issue through my own efforts?
I've done some research on your question and as best I can tell the only way to ensure that character entities are resolved is to declare them in a DTD. You can resolve the DTD contents yourself (e.g. for caching) by deriving an implementation from the Systm.Xml.XmlResolver base class and responding to GetEntity calls with a stream containing the DTD data.
I wrote an article some time back that explains how to push a default DTD onto the XmlParserContext if there is no DTD declared in your input document. This article is a little dated, but the same concept continues to work with XmlReaderSettings & XmlReader.Create by using an XmlReader.Create overload that accepts an XmlParserContext object as an argument.
Finally, it also looks like .NET 4 will help us out a little with a new XmlResolver derivative named XmlPreloadedResolver which seems to have the XHTML1 and RSS DTDs built in.
The funny thing is that, as sergeyt noted, XmlTextReader doesn't care about undefined entities when processing a xml fragment, while XmlReader does!
So a solution in many cases would be to try with an XmlTextRader.
精彩评论