Good approaches for processing xml in C++
I work on a multithreaded message processing application written in C++. The application receives xml messages, performs some action, and may publish out an xml message to another service if required.
Currently, the app works by extracting data while parsing the message and performing some action on that message in the middle of parsing. This seems like poor practice to me. I have the opportunity to create an alternative, and I'm considering approaches I can use.
One method I've thought of is to serialize the xml data into a data object, and once that is finished, extract and process data as needed. The disadvantage would be that I have to build a new class for eac开发者_如何学Ch different xml message I process (probably around 30), but that approach seems cleaner than what I have now.
Is there a better way than this? Should also mention the caveat that any code libraries developed outside the U.S. are unlikely to be approved.
Currently, the app works
Then what exactly are you fixing?
Don't fix what isn't broken.
There are typically two approaches to XML parsing: DOM and SAX. DOM builds up a document object model (like what you are proposing), whereas SAX invokes callbacks as parts of the document are visited during parsing. The free, well-known libxml2 library supports both parsing methods.
Typically, the SAX approach (i.e., using callbacks that get executed as the document is visited), uses less memory and can result in lower end-user latency, because you can start processing immediately, instead of having to wait for the entire document to have been parsed and built up.
The fact that your program is multithreaded is a red-herring. As long as you always pass an object to each of your callbacks, and that object is not shared between threads, you can safely do this with multiple different such objects in multiple different threads. Using a standard library such as libxml2 to do your parsing is also sensible from a reuse perspective.
There were probably some design decisions that were made which led to this approach (say for example, it's faster to process using a SAX like model than a DOM like model), with the latter you need to parse the entire message, with the former you can make decisions as you are called back with data.
I'd try to understand these first before making any changes, secondly aside from keeping you busy, is there a real business need for it? If not, move on and do something else...
精彩评论