开发者

using C#'s XmlReader on slightly malformed XML

I'm trying to use C#'s XmlReader on a large series of XML files, they are all properly formatted except for a few select ones (unfortunately I'm not in a position to have them changed, because it would break a lot of other code).

The errors only come from one specific part of the these affronting XML files and it's ok to just skip them but I don't want to stop reading the rest of the XML file.

The bad parts look like this:

 <InterestingStuff>
  ...
    <ErrorsHere OptionA|Something = "false" OptionB|SomethingElse = "false"/>
    <OtherInterestingStuff>
    ...
    </OtherInterestingStuff>
</InterestingStuff>

So really if I could just ignore invalid tags, or ignore the pipe symbol then I would be ok.

Trying to use XmlReader.Skip() when I see the name "ErrorsHere" doesn't work, apparently it already reads a bit ahead and throws the exception.

TLDR: How do I skip so I can read in the XML file above, using the开发者_如何学运维 XmlReader?

Edit:

Some people suggested just replacing the '|'-symbol, but the idea of XmlReader is to not load the entire file but only traverse parts you want, since I'm reading directly from files I can not afford the read in entire files, replace all instances of '|' and then read parts again :).


I've experimented a bit with this in the past.

In general the input simply has to be well-formed. An XmlReader will go into an unrecoverable error-state when the basic XML rules are broken. It is easy to avoid schema-validation but that's not relevant here.

Your only option is to clean the input, that can be done in a streaming manner (custom Stream or TextReader) but that will require a light form of parsing. If you don't have pipe-symbols in valid positions it's easy.


XmlReader is strict. Any non-conformance, it will error.

So no, you can't do that unless you write your own xml implementation. Fixup on the malformed data is probably easier.


Once I had a similar situation (with HTML files, not XML files). But I ended up using regular expression for each HTML file before entering it into my operation pipeline, to delete malformed parts. It came handy and was easier than struggling with the API. :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜