Keyword search engine that returns statistics instead of hits

2023-03-08 13:25 问答作者：

First post on StackOverflow, but I've always looked to this site as a great source of shared knowledge, and I'm excited to see what comes up from this question.

As I feel I have now reached the limits of what I can do with SQL indexes, statistics and full-text search, I'm currently looking for a search library that can provide us with the functionality we need. I'm not averse to writing it myself (and open-sourcing it if I can get the boss's approval), but I would prefer to find something open-source that already exists, natch.

What we're after is a search engine that can provide statistics about the results that are matched when a u开发者_高级运维ser searches for a specific keyword. Let's say, for example, that we were talking about a database of products in an online shop. We need to be able to return statistics about how many products there are that match a given set of keywords (and also be able to filter this result set by price, category, etc), as well as the total number of products in stock (assuming that this is stored in a field in the product table). All the search engines that I have found return the top n results, and if you want statistics about the size of the result set, you need to enumerate the whole set. Even if you didn't you still would need to do so to retrieve the total number of products in stock.

Is there anything anyone knows of that is capable of this functionality? As I say, I'm happy to get my hands dirty and either build it myself, or modify the functionality of something like Lucene, but I have not been able to find anything appropriate on Google.

Thanks in advance guys!

You might take a look at Solr, which is a faceted search engine built on top of Lucene. Solr will count lots of different things for you, in addition to doing full-text search. It is good at handling combinations of structured and full-text data.

Something to keep in mind here is that "enumerating all results" can mean very different things - select count(*) is very different from doing all the joins etc. required to actually get each object. This is true in Lucene as well as relational databases. So I wouldn't worry about the mere fact that the documentation says "we enumerate all results."

It's been my experience that the standard faceting of Solr scales to what 99% of people need. If you are in that 1% (i.e. you have a huge database) then I can suggest some ways of guessing the results which can be quicker. But Solr will probably work for you.

As I feel I have now reached the limits of what I can do with SQL indexes

Are you sure? I ask because if you are using MySQL, you might want to look into the full text search functionality of PostgreSQL. Especially when you combine it with the btree_gin and the trigram modules, and the extremely decent explain functionality that allows you to extract reasonable row estimates from highly complex queries.

继续阅读：database indexing lucene search sql

Keyword search engine that returns statistics instead of hits

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？