similarity match

2022-12-26 11:17 问答作者：

Many search engine have the 'did you mean' functionality.

Is there a simple way to use (N)Hibernate (e.g. ICriteria) to find an entity (e.g. key开发者_JS百科word) based on similarity. Please note that I do not mean Expression.Like or something like this.

I hope this question makes sense.

Thanks.

Christian

PS:

similarity means in my case (let us say) 70% of characters in common.

I envisaged to implement an extension method called bla which I can use for my criteria queries:

ICriteria Criteria = Session.CreateCriteria(typeof(xxx)); Criteria.Add(Expression.bla("name ", name)); return Criteria.List() as List;

It's out of scope for nHibenate. nHibernate is a data access layer, it can only do things that the database does. You would have to determine similarities yourself, perhaps by maintaining a table of common mistypes. That's what search engines do anyway, they don't just magically determine what's a typo.

As others said, it's generally out of scope for a RDBMS. Use Lucene.Net (possibly via NHibenate.Search) or Solr (possibly via SolrNet) instead. Solr even comes with spell checking out of the box which you can use to easily implement "did you mean" functionality.

You can use the SOUNDEX function in SQL

SELECT
    * 
FROM
    Products
WHERE
    SOUNDEX(ProductName) = SOUNDEX('beer')

This will return products which have names similar to "beer".

UPDATE:

SELECT
    * 
FROM
    Products
WHERE
    DIFFERENCE(ProductName, 'beer') IN (3, 4)

This would also return products with similar names...

-Pavel

Hibernate won't make your database any smarter than it already is. "Did you mean" is a very tricky business; it is generally implemented by doing statistical analysis of words and n-grams (multi-word sequences) against the metadata of the search engine's inverted-file index structures and query logs.

As an exmaple, if I type exmaple code, the engine might do a scan of the most common known words in the corpus, computing each word's edit distance from the term exmaple. It will probably find example and thus suggest, "Did you mean example code".

Similarity is hard to define and IMHO is defined differently in many use cases. Similarity can be phonetically (there are different algorithms like Köllner Verfahren for Germany). In case of phonetically similarity it's a function that calculates the string representation. Then one could use the Levenshtein distance to compare them. I don't know much about (N)Hibernate, but an extension method could be used to calculate the comparison on object base.

-sa

I don't think NHibernate has a functionality which inherently provides you the similar words.

You have to create a distance function which calculates whats the distance between words (how similar they are) and based on a threshold value you can consider all the words that has distance values below that value with respect to your original word.

This distance function is the key, and you can have many criteria based on which you calculate the distance between words

继续阅读：nhibernate

similarity match

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？