开发者

XQuery regexp error

Why next code returns true (Saxon-EE 9.2 for .NET)?

matches('some text>', '^[\w ]{3,200}$')

There is no > symbol in the pattern. Thanks.

XQuery:

<regexp-test>
    <!-- why true? -->
    <test1>{matches('some text>', '^[\w ]{3,200}$')}</test1>
    <test2>{matches('some text>', '^[\w ]+$')}</test2>
    <test3>{matches('&lt; < >', '^[\w ]+$')}</test3>
    <!-- valid: --> 
    <test4>{matches('some text!', '^[\w ]+$')}</test4>  
    <test5>{matches('.,', '^[\w ]+$')}</test5> 
</regexp-test>

Output:

<regexp-test>
  <!-- why true? -->
  <test1>true</test1>
  <test2>true</test2>
  <test3>true</test3>
  <!-- valid: -->
  <test4>false</test4>
  <test5>false</test开发者_如何学C5>
</regexp-test>


After some digging, experimentation and help from others in the eXist community, I find that the definition of character classes in UNICODE and used in the definition of regexps in XPath and XML schema is different to the POSIX classes. In particular the characters

$+<=>^|~

are in the Symbol class \p{S} not the Punctuation class \p{P}. Since the definition of \w ( from http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes-with-errata.html ) is

"[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters) "

these characters will be included in \w

This leads to a workaround using [^\W\p{S}]


I'll have a go...

I will guess that you meant to write

matches( 'some text' , '^[\w ]{3,200}$' )

The regex says to start at the beginning of the string (^), match at least 3, and at most 200 ({3,200}) characters or spaces ([\w ]), and then expect the end of the string ($).

So, some text will return true since it consists of the right characters [a-zA-Z0-9_ ] and there are 9 of them.

If the match is this, for example

matches( 'some text' , '^[\w ]{3,5}$' )

The result should return false since it will only match strings of length 3 to 5.

Perhaps you think \w means whitespace or something else?


> is not a valid character in a string in this situation and needs to be replaced by its representation &gt;. I guess it is being silently dropped and therefore the regex matches.

See also w3schools.com: "XQuery is case-sensitive and XQuery elements, attributes, and variables must be valid XML names." - and > is not allowed inside XML attributes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜