My custom XML reader is a two-legged turtle. Suggestions?

2022-12-20 08:12 问答作者：

I wrote a custom XML reader because I needed something that would not read ahead from the source stream. I wanted the ability to have an object read its data from the stream without negatively affecting the stream for the parent object. That way, the stream can be passed down the object tree.

It's a minimal implementation, meant only to serve the purpose of the project that uses it (right now). It works well enough, except for one method -- ReadString. That method is used to read the current element's content as a string, stopping when the end element is reached. It determines this by counting nesting levels. Meanwhile, it's reading from the stream, character by character, adding to a StringBuilder for the resulting string.

For a collection element, this can take a long time. I'm sure there is much that can be done to better implement this, so this is where my continuing education begins once again. I could really use some help/guidance. Some notes about methods it calls:

Read - returns the next byte in the stream or -1.

ReadUntilChar - calls Read until the specified character or -1 is reached, appending to a string with StringBuilder.

Without further ado, here is my two-legged turtle. Constants have been replaced with the actual values.

public string ReadString() {
    int level = 0;
    long originalPosition = m_stream.Position;
    StringBuilder sb = new StringBuilder();
    sbyte read;
    try {
        // We are already within the element that contains the string.
        // 开发者_如何学GoRead until we reach an end element when the level == 0.
        // We want to leave the reader positioned at the end element.
        do {
            sb.Append(ReadUntilChar('<'));
            if((read = Read()) == '/') {
                // End element
                if(level == 0) {
                    // End element for the element in context, the string is complete.
                    // Replace the two bytes of the end element read.
                    m_stream.Seek(-2, System.IO.SeekOrigin.Current);
                    break;
                } else {
                    // End element for a child element.
                    // Add the two bytes read to the resulting string and continue.
                    sb.Append('<');
                    sb.Append('/');
                    level--;
                }
            } else {
                // Start element
                level++;
                sb.Append('<');
                sb.Append((char)read);
            }
        } while(read != -1);

        return sb.ToString().Trim();
    } catch {
        // Return to the original position that we started at.
        m_stream.Seek(originalPosition - m_stream.Position, System.IO.SeekOrigin.Current);
        throw;
    }
}

Right off the bat, you should using a profiler for performance optimizations if you haven't already (I'd recommend SlimTune if you're on a budget). Without one you're just taking slightly-educated stabs in the dark.

Once you've profiled the parser you should have a good idea of where the ReadString() method is spending all its time, which will make your optimizing much easier.

One suggestion I'd make at the algorithm level is to scan the stream first, and then build the contents out: Instead of consuming each character as you see it, mark where you find <, >, and </ characters. Once you have those positions you can pull the data out of the stream in blocks rather than throwing characters into a StringBuilder one at a time. This will optimize away a significant amount of StringBuilder.Append calls, which may increase your performance (this is where profiling would help).

You may find this analysis useful for optimizing string operations, if they prove to be the source of the slowness.

But really, profile.

Your implementation assumes the Stream is seekable. If it is known to be seekable, why do anything? Just create an XmlReader at your position; consume the data; ditch the reader; and seek the Stream back to where you started?

How large is the xml? You may find that throwing the data into a DOM (XmlDocument / XDocument / ec) is a viable way of getting a reader that does what you need without requiring lots of rework. In the case of XmlDocument, XmlNodeReader would suffice, for example (it would also provide xpath support if you want to use non-trivial queries).

I wrote a custom XML reader because I needed something that would not read ahead from the source stream. I wanted the ability to have an object read its data from the stream without negatively affecting the stream for the parent object. That way, the stream can be passed down the object tree.

That sounds more like a job for XmlReader.ReadSubTree(), which lets you create a new XmlReader to pass to another object to initialise itself from the reader without it being able to read beyond the bounds of the current element.

The ReadSubtree method is not intended to create a copy of the XML data that you can work with independently. Rather, it can be used create a boundary around an XML element. This is useful if you need to pass data to another component for processing and you wish to limit how much of your data the component can access. When you pass an XmlReader returned by the ReadSubtree method to another application, the application can access only that XML element, rather than the entire XML document.

It does say that after reading the subtree the parent reader is re-positioned to the "EndElement" of the current element rather than remaining at the beginning, but is that likely to be a problem?

Why not use an existing one, like this one?

继续阅读：parsing xml

My custom XML reader is a two-legged turtle. Suggestions?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？