开发者

Xquery on MarkLogic using OR

This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:

<Person id="1">
  <Name>Bob</Name>
  <City>Oakland</City>
  <Phone>2122931022</Phone>
  <Phone>3123032902</Phone&开发者_运维百科gt;
</Person>

Note that a document can and will have multiple Phone elements.

I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.

I have tried this:

let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383") 
let $c := cts:word-query("3933849383") 
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)

which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:

<results>
  <match id="18" phone="2123339494" name="bob" city="oakland"/>
  <match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>

So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!


Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.

First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.

Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.

Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.

Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.

To get what you ask, you could use the following code:

<results>{
    for $phone in
        cts:search(
            doc()/Person/Phone,
            cts:element-value-query(
                xs:QName("Phone"),
                ("3738494044", "2373839383", "3933849383")
            )
        )
    return
        <match id="{data($phone/../@id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
}</results>

Best of luck.


Here's what I would do:

let $numbers := ("3738494044", "2373839383", "3933849383")
return
<results>{
    for $person in cts:search(/Person, cts:element-value-query(xs:QName("Phone"),$numbers))
    return
    <match id="{data($person/@id)}" name="{data($person/Name)}" city="{data($person/City)}">
      {
        for $phone in $person/Phone[cts:contains(.,$numbers)]
        return element phone {$phone}
      }
    </match>

}

First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.

Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.

I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.

You could do all of this with the SearchAPI and a little XSLT. That would make it easy to start combining names and numbers and other conditions in a single query.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜