using C#'s XmlReader on slightly malformed XML

2023-03-19 21:23 问答作者：

I'm trying to use C#'s XmlReader on a large series of XML files, they are all properly formatted except for a few select ones (unfortunately I'm not in a position to have them changed, because it would break a lot of other code).

The errors only come from one specific part of the these affronting XML files and it's ok to just skip them but I don't want to stop reading the rest of the XML file.

The bad parts look like this:

 <InterestingStuff>
  ...
    <ErrorsHere OptionA|Something = "false" OptionB|SomethingElse = "false"/>
    <OtherInterestingStuff>
    ...
    </OtherInterestingStuff>
</InterestingStuff>

So really if I could just ignore invalid tags, or ignore the pipe symbol then I would be ok.

Trying to use XmlReader.Skip() when I see the name "ErrorsHere" doesn't work, apparently it already reads a bit ahead and throws the exception.

TLDR: How do I skip so I can read in the XML file above, using the开发者_如何学运维 XmlReader?

Edit:

Some people suggested just replacing the '|'-symbol, but the idea of XmlReader is to not load the entire file but only traverse parts you want, since I'm reading directly from files I can not afford the read in entire files, replace all instances of '|' and then read parts again :).

I've experimented a bit with this in the past.

In general the input simply has to be well-formed. An XmlReader will go into an unrecoverable error-state when the basic XML rules are broken. It is easy to avoid schema-validation but that's not relevant here.

Your only option is to clean the input, that can be done in a streaming manner (custom Stream or TextReader) but that will require a light form of parsing. If you don't have pipe-symbols in valid positions it's easy.

XmlReader is strict. Any non-conformance, it will error.

So no, you can't do that unless you write your own xml implementation. Fixup on the malformed data is probably easier.

Once I had a similar situation (with HTML files, not XML files). But I ended up using regular expression for each HTML file before entering it into my operation pipeline, to delete malformed parts. It came handy and was easier than struggling with the API. :)

继续阅读：.net malformed xml xmlexception

using C#'s XmlReader on slightly malformed XML

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？