How can I improve the speed of XML parsing in VBA
I have a large XML file that needs parsed in VBA (excel 2003 & 2007). There could be upwards of 11,000 'rows' of data in the xml file with each 'row' having between 10 and 20 'columns'. This ends up being a huge task just to parse through and grab the data (5 - 7 minutes). I tried reading the xml and placing each 'row' into a dictionary (key = row number, value = Row Attributes), but this takes just as long.
It is taking forever to traverse the DOM. Is there a more efficient way?
Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
开发者_C百科 Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode.SelectNodes("Row")
For Each RowNode In RowList
Set ColumnList = RowNode.SelectNodes("Col")
Dim NodeValues As String
For Each ColumnNode In ColumnList
NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter, NodeValues
Counter = Counter + 1
Next RowNode
End Sub
You could try using SAX instead of DOM. SAX should be faster when all you are doing is parsing the document and the document is non-trivial in size. The reference for the SAX2 implementation in MSXML is here
I typically reach straight for the DOM for most XML parsing in Excel but SAX seems to have advantages in some situations. The short comparison here might help to explain the differences between them.
Here's a hacked-together example (partially based on this) just using Debug.Print
for output:
Add a reference to "Microsoft XML, v6.0" via Tools > References
Add this code in a normal module
Option Explicit
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"
Set saxReader = Nothing
End Sub
Add a class module, call it ContentHandlerImpl
and add the following code
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Use the left-hand drop-down at the top of the module to choose "IVBSAXContentHandler" and then use the right-hand drop-down to add stubs for each event in turn (from characters
to startPrefixMapping
)
Add code to some of the stubs as follows
Explicitly set up the counter and the flag to show if we want to read text data at this time
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
Every time a new element starts, check the name of the element and take appropriate action
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
Check to see if we are interested in the text data and, if we are, chop off any extraneous white space and remove all line feeds (this may or may not be desirable depending on the document you are trying to parse)
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
If we have reached the end of a Col
then stop reading the text values; if we have reached the end of a Row
then print out the string of node values
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
To make things clearer, here is the full version of ContentHandlerImpl
with al of the stub methods in place:
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bGetChars) Then
sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If
End Sub
Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
End Property
Private Sub IVBSAXContentHandler_endDocument()
End Sub
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "Col"
bGetChars = False
Case "Row"
lCounter = lCounter + 1
Debug.Print lCounter & " " & sNodeValues
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)
End Sub
Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
lCounter = 0
bGetChars = False
End Sub
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "Row"
sNodeValues = ""
Case "Col"
sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
bGetChars = True
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)
End Sub
Use the SelectSingleNode
function. This will let you search for a node based on pattern matching.
For instance, I created the following function:
Private Function getXMLNodeValue(ByRef xmlDoc As MSXML2.DOMDocument, ByVal xmlPath As String)
Dim node As IXMLDOMNode
Set node = xmlDoc.SelectSingleNode(xmlPath)
If node Is Nothing Then getXMLNodeValue = vbNullString Else getXMLNodeValue = node.Text
End Function
Now, if I have the following XML file:
I can simply call:
myValue = getXMLNodeValue(xmlResult, "//ErrorStatus/Source")
and it will jump through to the first key called 'Error Status' at any depth, and pull out the text in the 'Source' node - returning "INTEGRATION"
精彩评论