XQuery regexp error
Why next code returns true (Saxon-EE 9.2 for .NET)?
matches('some text>', '^[\w ]{3,200}$')
There is no > symbol in the pattern. Thanks.
XQuery:
<regexp-test>
<!-- why true? -->
<test1>{matches('some text>', '^[\w ]{3,200}$')}</test1>
<test2>{matches('some text>', '^[\w ]+$')}</test2>
<test3>{matches('< < >', '^[\w ]+$')}</test3>
<!-- valid: -->
<test4>{matches('some text!', '^[\w ]+$')}</test4>
<test5>{matches('.,', '^[\w ]+$')}</test5>
</regexp-test>
Output:
<regexp-test>
<!-- why true? -->
<test1>true</test1>
<test2>true</test2>
<test3>true</test3>
<!-- valid: -->
<test4>false</test4>
<test5>false</test开发者_如何学C5>
</regexp-test>
After some digging, experimentation and help from others in the eXist community, I find that the definition of character classes in UNICODE and used in the definition of regexps in XPath and XML schema is different to the POSIX classes. In particular the characters
$+<=>^|~
are in the Symbol class \p{S} not the Punctuation class \p{P}. Since the definition of \w ( from http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes-with-errata.html ) is
"[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters) "
these characters will be included in \w
This leads to a workaround using [^\W\p{S}]
I'll have a go...
I will guess that you meant to write
matches( 'some text' , '^[\w ]{3,200}$' )
The regex says to start at the beginning of the string (^
), match at least 3, and at most 200 ({3,200}
) characters or spaces ([\w ]
), and then expect the end of the string ($
).
So, some text
will return true since it consists of the right characters [a-zA-Z0-9_ ]
and there are 9 of them.
If the match is this, for example
matches( 'some text' , '^[\w ]{3,5}$' )
The result should return false since it will only match strings of length 3 to 5.
Perhaps you think \w
means whitespace or something else?
>
is not a valid character in a string in this situation and needs to be replaced by its representation >
. I guess it is being silently dropped and therefore the regex matches.
See also w3schools.com: "XQuery is case-sensitive and XQuery elements, attributes, and variables must be valid XML names." - and >
is not allowed inside XML attributes.
精彩评论