开发者

How can I use Lucene to search for Xml documents?

I'm using Lucene to search through an index of XML documents. I'm supposed to look for documents that have certain words inside certain tags. What would be the best way to go about this?

I tried to use RegexQuery with something like "tag.*?word.*?tag", but that returned no results.

To clarify, and example of an XML:开发者_开发技巧

<?xml version="1.0" encoding="utf-8"?>
<Legislation>
    <ENTRY COLNAME="COL1">
    <LegBody_1_1 ID="KEY_3">
        <ParagraphNum REFID="284:1" JUMP_LINK_KEY="0">1. </ParagraphNum>In the following pragraphs - </LegBody_1_1>
        <LegBody_1_2 ID="KEY_4">
            <Term>"Legal Guardian" </Term>
            <Definition> - a person to whom legal title to property is entrusted to use for another's benefit; </Definition>
        </LegBody_1_2>
        <LegBody_1_2 ID="KEY_5">
            <Term>"Authority" </Term>
            <Definition> - Any civil servant appointed by the department head or minister; </Definition>
        </LegBody_1_2>

.... more tags..

</Legislation>

A search looking for the word "legal" in the tag "definition" ("definition.*?legal.*?definition") should return this document.

Any ideas?


I'd have a look at Parsing, indexing, and searching XML with Digester and Lucene.


I'd also explore native XML databases. eXist-db (http://exist-db.org) has Lucene built in, so you can keep your XML intact and query the structure with XQuery while applying Lucene indexes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜