开发者

SOLR Scoring : Can I Extract Hit Count Value from Solr?

My use for Solr is to generate a primary search system : we basically feed a large set of documents in small batches and search a pre-specified query on it. Each of these documents is scanned开发者_开发知识库 for this particular query and if found, we need to store the database the file index id, path and hit count of that string in that document. I have searched online for ways to extract hit count values from solr for each document, but all I have understood so far is that Solr automatically sorts its results on the basis of hit-count and a variety of other factors, which you can edit using boosts and function query parameters.

  1. Is there an established way of extracting hit count from Solr?

  2. If not, is it possible to alter Solr's scoring formula such that it ONLY considers hit count, and then ask Solr to return the score (which would essentially be the hit-count in this case)

(I'm sorry that my question appeared a little confusing. I only want the hit count returned from Solr for each document so that I can store it in my database. Is that directly possible through solr? By hit-count, i mean that if i'm searching for a particular keyword, the no. of occurences in the indexed fields for each document in the Solr index.)

SOLR Results are actually sorted on the basis of the document's relevancy score right, which includes term frequency and a lot of other smaller factors.... i want only the hit-count to be returned : I was wondering if there is either a direct way to get the hit-count, or to alter how Solr scores documents so that it only scores on the basis of term frequency factor and get the term frequency value for each doc in my SOLR output


Can I Extract Hit Count Value from Solr? sorts its results on the basis of hit-count

Your headline topic is about "Hit Count", but by reading your text, it seams, you are interested in the solr score - because by default solr sorts by score - is this, what you mean with "hit count"?

Is there an established way of extracting hit count from Solr?

Yes, it is possible to get the "score" value of an searched document (by the way it's also possible to get the hit-count)

To get the score, simple expand the "field list" parameter (fl) by "score". http://wiki.apache.org/solr/CommonQueryParameters#fl Example if you have the fields DOCUMENT,ID,PTAH - add score like that: http://localhost:8080/solr/select/?fl=DOCUMENT,ID,PTAH,score Example:

  <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">5</int>
    <lst name="params">
    <str name="start">0</str>
    <str name="fl">DOCUMENT,ID,PTAH,score</str>
    </lst>
  </lst>

//Update:

the no. of occurences

How often an (key)word exits at the index? At an specific field? So facet search will count for you: http://wiki.apache.org/solr/SolrFacetingOverview

//Update 2:

2nd update - if you like to count the number of keywords inside one document in order to receive the document ID and the number of words for this document, you can use facet search in combination with facet range query range. Example: look at all documents for manufacturer "dell" and return the frequency of this keyword for every document(ID)

ID -> "dell" exits how many times?
241 -> 2
242 -> 0
243 -> 5

For this, use the following search parameter:

<str name="facet.field">YOUR_TEXTFIELD</str>
<str name="facet.range">ID</str>        <--- ID=field woth the document ID
<str name="f.ID.facet.range.gap">1</str> <--- count ID in step of 1
<str name="f.ID.facet.range.start">0</str>   <--- start ID for faceted search
<str name="q">dell</str>                   <---string, "keyword" to look/count for
<str name="f.ID.facet.range.end">1000</str> <--- end ID for faceted search
<str name="facet">true</str>
<str name="facet.method">enum</str>


It is relatively easy using solr 4.0 just add a pseudo field to the fl parameter:

q=*:*&fl=*,termfreq(field,term)

You can name the returned value if you like too:

q=*:*&fl=*,tf:termfreq(field,term)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜