开发者

Checking XML Nodes with Script using REGEX

Bit of a problem here, basically I have a web address that I use a GET with. for example 111.244.25.633/Data.XML (Don't worry about the IP, it's just made up for a device) that returns an XML file.

In Data.XML we have some Nodes and sub-no开发者_如何学Cdes. (like a tree I suppose) For example the entire Data is encapsulated like this: (indentions mean it's a subnode of the above...etc)

 <Data>
   <DeviceData>
       <Info>
        <SerialNumber>154236</SerialNumber>
        <Ethernet>Y</Ethernet>
        <Wireless>N</Wireless>
        <Mac>00:25:F6:25:K9</Mac>
       </Info>
   </DeviceData>
 </Data>

Basically, I want to use Regular Expressions to check the Subnodes (like I'd want to make sure Serial Number is a 6 digit number, and not something else)

The subnodes will always be called the same thing (Like Serial Number,Device Data,Data.......etc)

What is a good Extension/Language that would be easiest to use to do this? I know basic python and bash, I know C/C++ very well...but this seems like more of a scripting task to me.

Any ideas?

edit: I forgot to add: I may have MORE or less XML tags (like some devices have more settings and such) So i'd be picking out specific ones in the script, not looking at EVERY single tag....since some may have more or less than others.


Please see this response: Regular Expressions to parse template tags in XML

A demonstration for your project...

from xml.etree import ElementTree
import re

def proper_SN(elem):
    if re.search('\d{6}', elem.text):
        return True
    return False

tree = ElementTree.parse('data.xml')
rows = tree.getiterator('SerialNumber')
for row in rows:
    print "SerialNumber: %s Passed = %s" % (row.text, proper_SN(row))

Running this...

[mpenning@hotcoffee tmp]$ python parse.py 
SerialNumber: 154236 Passed = True
[mpenning@hotcoffee tmp]$

EDIT

I'm not sure how the XML might change... assuming you change the DeviceData element...

 <Data>
   <DeviceData>
       <Info>
        <SerialNumber>154236</SerialNumber>
        <EngineVersion>12.0.4.13</EngineVersion>
        <MediaType>100BaseT</MediaType>
        <Ethernet>Y</Ethernet>
        <Wireless>N</Wireless>
        <Mac>00:25:F6:25:K9</Mac>
       </Info>
   </DeviceData>
 </Data>

Using a simplified script...

from xml.etree import ElementTree
import re

def proper_SN(elem):
    if re.search('\d{6}', elem.text):
        return True
    return False

tree = ElementTree.parse('data.xml')
serial = tree.find('DeviceData/Info/SerialNumber').text
engine = tree.find('DeviceData/Info/EngineVersion').text
media = tree.find('DeviceData/Info/MediaType').text

if proper_SN:
    serstr = "good"
else:
    serstr = "bad"

print "Found a %s serial number (%s), with engine %s and media %s" % (serstr, serial, engine, media)

I get

[mpenning@hotcoffee tmp]$ python parse.py 
Found a good serial number (154236), with engine 12.0.4.13 and media 100BaseT
[mpenning@hotcoffee tmp]$


Use XML parsing modules, like lxml or ElementTree (in Python stdlib), instead of regex. Then, you can use a regex to verify the serial number. Here's some code to do this using ElementTree:

import re
import xml.etree.ElementTree

tree = xml.etree.ElementTree.XML(r'''
 <Data>
   <DeviceData>
       <Info>
        <SerialNumber>154236</SerialNumber>
        <Ethernet>Y</Ethernet>
        <Wireless>N</Wireless>
        <Mac>00:25:F6:25:K9</Mac>
       </Info>
   </DeviceData>
 </Data>
''')

serial = tree.find('DeviceData/Info/SerialNumber')
print serial.text

if re.match('\d{6}', serial.text.strip()):
    print 'OK'
else:
    print 'ERROR'


You could also do this with XSLT 2.0, if you prefer a more declarative way of writing your rules ( versus the procedural approach with python & lxml ).

Something like:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">

  <xsl:output method="text" />

  <xsl:template match="SerialNumber[matches( normalize-space(.), '^\d{6}$')]" >
    <xsl:value-of select="." /> Passes.
  </xsl:template>

  <xsl:template match="SerialNumber[not( matches( normalize-space(.), '^\d{6}$'))]" >
    <xsl:value-of select="." /> Fails.
  </xsl:template>

  <xsl:template match="text()">
    <!-- override default template, output nothing -->
  </xsl:template>

</xsl:stylesheet>

will output:

154236 Passes.

X154236 Fails.

If you have a lot of rules to check, maybe you should look at XML Schema languages like Relax NG or Schematron. Schema are a way of writing the grammar for a XML document that is more expressive that DTDs. You write the declarative rules and in the schema language, and the processor writes the XSLT code that will validate the XML against the schema.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜