Regular expression works normally, but fails when placed in an XML schema
I have a simple doc.xml
file which contains a single root element with a Timestamp attribute:
<?xml version="1.0" encoding="utf-8"?>
<root Timestamp="04-21-2010 16:00:19.000" />
I'd like to validate this document against a my simple schema.xsd
to make sure that the Timestamp is in the correct format:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:attribute name="Timestamp" use="required" type="timeStampType"/>
</xs:complexType>
</xs:element>
<xs:simpleType name="timeStampType">
<xs:restriction base="xs:string">
<xs:pattern value="(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}" />
</xs:restriction>
</xs:simpleType>
</xs:schema>
So I use the lxml Python module and try to perform a simple schema validation and report any errors:
from lxml import etree
schema = etree.XMLSchema( etree.parse("schema.xsd") )
doc = etree.parse("doc.xml")
if not schema.validate(doc):
for e in schema.error_log:
print e.message
My XML document fails validation with the following error messages:
Element 'root', attribute 'Timestamp': [facet 'pattern'] The value '04-21-2010 16:00:19.000' is not accepted by the pattern '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}'.
Element 'root', attribute 'Timestamp': '04-21-2010 16:00:19.000' is not a valid value of the atomic type 'timeStampType'.
So it looks like my regular expression must be faulty. But when I try to validate the regular expression at the command line, it passes:
>>> import re
>>> pat = '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9开发者_如何学Python]{3}'
>>> assert re.match(pat, '04-21-2010 16:00:19.000')
>>>
I'm aware that XSD regular expressions don't have every feature, but the documentation I've found indicates that every feature that I'm using should work.
So what am I mis-understanding, and why does my document fail?
Your |
s match wider than you think.
(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3}
is parsed as:
(0[0-9]{1})
-or-
(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3}
You need to use more groupings if you want to avoid it; e.g.
((0[0-9]{1})|(1[0-2]{1}))-((3[0-1]{1}|[0-2]{1}[0-9]{1}))-[2-9]{1}[0-9]{3} (([0-1]{1}[0-9]{1}|2[0-3]{1})):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}
The expression has several errors.
- You allow
00
as a valid month. A|BC
matchesA
andBC
- notAC
andBC
. Hence your expression starting with(0[0-9]{1})|
matches any string containing00
through09
. What you want is(0[1-9]|1[0-2])-
only matching01
through12
followed by a dash.- You allow
00
as a valid day. - The pattern is not anchored to the beginning and end of the text - add
^
and$
. That is why your test using Python succeeded.
By the way - why don't you use xs:dateTime
? It has a very similar format - yyyy-mm-ddThh:mm:ss.fff
I think.
精彩评论