Can't find unclosed element in XML

2023-02-21 16:48 问答作者：

I have a large XML file (~18MB). Apparently there is a tag somewhere in it that isn't closed. I know this because when I ran the W3C markup validation tool (validator.w3.org), I get the following error:

You may have neglected to close an element, or perhaps you meant to "self-close" an element, that is, ending it with "/>" instead of ">".

My question is how I m开发者_如何转开发ight go about finding this missing closed element among the 500,000 lines in the file. Is there a tool I could use that would suggest places where there might be a problem -- such as an element that has not been closed after a certain number of lines?

Any ideas would be much appreciated.

I use Notepad++ which has an excellent XML Tools plugin that lets you check XML Syntax and takes you to the line that is problematic. It also has useful utilities.

Can't find unclosed element in XML

I just opened an XML file in VS 2010 (with ReSharper), broke the XML and what do you know? The error was highlighted immediately. If you have access to the same, it's that simple.

xmllint is a standard tool for this. From the Validation & DTDs page:

The simplest way is to use the xmllint program included with libxml. The --valid option turns-on validation of the files given as input. For example the following validates a copy of the first revision of the XML 1.0 specification:

xmllint --valid --noout test/valid/REC-xml-19980210.xml

the -- noout is used to disable output of the resulting tree.

The --dtdvalid dtd allows validation of the document(s) against a given DTD.

Libxml2 exports an API to handle DTDs and validation, check the associated description.

If your document isn't "pretty-printed" it can still be hard to find the offending node, so you might want to use xmllint to rewrite the file to be indented.

Since you do not have an XML Schema, there is no fool-proof way of finding the offending code, for example XML allows for recursive structures. But you CAN write your own XML Schema, although that will potentially be a lot of stuff to learn. Alternatively, I would create a simple, stupid, validator of the node level and the element name, as so:

private void parseAndCheckStructure(XMLStreamReader reader) throws XMLStreamException {

    // first read header, this is probably not the offending element (?)
    int event = -1;
    while (reader.hasNext()) {
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){
            break;
        } else if (event == XMLStreamConstants.END_DOCUMENT) {
            throw new XMLStreamException();
        }
    }

    // read the rest of the document.
    int level = 1;
    do {
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){
            level++;
            String localName = reader.getLocalName();
            if(localName.equals("FirstElement")) {
                parseFirstElementWithALoopLikeTheCurrent(reader);

                level--;
            } else if(localName.equals("SecondElement")) {
                parseSecondElementWithALoopLikeTheCurrent(reader);

                level--;

            } else throw new RuntimeException("Unknown element " + localName + " at level " + level + " and location " + reader.getLocation());

        } else if(event == XMLStreamConstants.END_ELEMENT) {
            // keep track of level
            level--;
        }
    } while(level > 0);

}

Alternatively, parse the whole document within the above do-while loop, and do checks like

if(level == 4 && localName.equals("MyElement")) {
    // ok
} else {
    // throw exception with the location
}

It sucks, but it works.

Try Opening the .xml file with chrome browser, It'll pin point the exact location of the fault.

继续阅读：xml

Can't find unclosed element in XML

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？