What's the best way to parse XML in the middle of other text
How can I parse an xml in the midle of other text.
Example: If I have this text file in C# how can I parse the xml part:
-> Begin of file
2010-01-01 tehgvdhjjsad
2010-01-02 dsjhnxcucncu
14:55 iahsdahksdjh
<Answer>
<headline>
<a1>1</a1>
<a2>2</a2>
</headline>
</Answer开发者_C百科>
2010-01-05 tehgvddsda
2010-01-05 ddsada
22:55 iahsdahksdjh2
<Answer>
<headline>
<a1>11</a1>
<a2>22</a2>
</headline>
</Answer>
-> End of file
Several ways:
1. Do a string.IndexOf("<Answer>") and then use a substring to chop off the header information. Then add the substring like this:
xmlString = "<Answers>" + substringXml + "</Answers>". Then you could parse the xml as valid XML.
2. Use an xmltextreader created with fragment conformance levels and read through the xml. Only stop on the Answer elements and do processing.
3. Add a root element to the document and open it in an XmlDocument and use an xpath expression to read out the Answer elements.
Well, there aren't many things that can help you with something that. AFAIK there are two possibilities:
Option 1. If all the xml fragments have the same root-node, ie. "<Answer>", then you can simply find loop through the occurrences of <Answer> finding the next occurence of the closing </Answer>, extract the text between the two and use a normal XML parser.
Option 2. If it's a anything xml goes kind of thing then you could use this Regex based Html Parser I wrote some time ago. It should handle that input without issue; however, you will have to deal with the open/close elements and determine what to do with them.
精彩评论