Java - extract XML documents from String
Having a random String, how to extract the XML document(s) from it ?
Consider that the String might开发者_运维技巧 hold none (incomplete), one (complete), or multiple documents.
Is there a template / tool to solve this problem ?
LE: consider the case when XML data is retrieved via TCP/IP
Multiple documents is a challenge... I'd wrap the String into an additional "root", this would at least transform the content to a valid xml-document:
String xml = "<my-own-root-element>" + getString() + "</my-own-root-element>";
Just a start. Of course, forget about xml schemas and doctype. Different character encodings may be a challenge and you may have to filter out the <?xml ... ?>
processing instructions.
I know no existing solution that can handle broken XML documents automatically. XML is a very strict standard with little leeway when it comes to parse errors. You are on your own.
What you can try is looking at the code for XML editors; they must be able to handle corrupt documents but I doubt that any of them can handle things like missing start elements and such.
this is my C# version of it, hope it gives some direction... I'm using it for tcp/ip communication, and T stands for some generic type.
public List<T> ParseMultipleDocumentsByType<T>(string documents)
{
var cleanParsedDocuments = new List<T>();
var stringContainsDocuments = true;
while (stringContainsDocuments )
{
if(documents.Contains(typeof(T).Name))
{
var startingPoint = documents.IndexOf("<?xml");
var endingString = "</" +typeof(T).Name + ">";
var endingPoing = documents.IndexOf(endingString) + endingString.Length;
var document = documents.Substring(startingPoint, endingPoing - startingPoint);
var singleDoc = (T)XmlDeserializeFromString(document, typeof(T));
cleanParsedDocuments.Add(singleDoc);
documents = documents.Remove(startingPoint, endingPoing - startingPoint);
}
else
{
flag = false;
}
}
return cleanParsedDocuments;
}
public static object XmlDeserializeFromString(string objectData, Type type)
{
var serializer = new XmlSerializer(type);
object result;
using (TextReader reader = new StringReader(objectData))
{
result = serializer.Deserialize(reader);
}
return result;
}
精彩评论