开发者

how to Index URL in SOLR so I can boost results after website

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.

I want to boost results from a specific website using boost query. For example I have 4 documents each containing in SourceURL the following data

  1. https://meta.stackoverflow.com/page1
  2. http://www.stackoverflow.com/page2
  3. https://stackoverflow.com/page3
  4. https://stackexchange.com/page1

I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).

Do you know how can I index the url field and then use boost query t开发者_运维问答o identify all the documents from a specific website like in the case above ?


One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).

Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜