开发者

similarity match

Many search engine have the 'did you mean' functionality.

Is there a simple way to use (N)Hibernate (e.g. ICriteria) to find an entity (e.g. key开发者_JS百科word) based on similarity. Please note that I do not mean Expression.Like or something like this.

I hope this question makes sense.

Thanks.

Christian

PS:

similarity means in my case (let us say) 70% of characters in common.

I envisaged to implement an extension method called bla which I can use for my criteria queries:

ICriteria Criteria = Session.CreateCriteria(typeof(xxx)); Criteria.Add(Expression.bla("name ", name)); return Criteria.List() as List;


It's out of scope for nHibenate. nHibernate is a data access layer, it can only do things that the database does. You would have to determine similarities yourself, perhaps by maintaining a table of common mistypes. That's what search engines do anyway, they don't just magically determine what's a typo.


As others said, it's generally out of scope for a RDBMS. Use Lucene.Net (possibly via NHibenate.Search) or Solr (possibly via SolrNet) instead. Solr even comes with spell checking out of the box which you can use to easily implement "did you mean" functionality.


You can use the SOUNDEX function in SQL

SELECT
    * 
FROM
    Products
WHERE
    SOUNDEX(ProductName) = SOUNDEX('beer')

This will return products which have names similar to "beer".

UPDATE:

SELECT
    * 
FROM
    Products
WHERE
    DIFFERENCE(ProductName, 'beer') IN (3, 4)

This would also return products with similar names...

-Pavel


Hibernate won't make your database any smarter than it already is. "Did you mean" is a very tricky business; it is generally implemented by doing statistical analysis of words and n-grams (multi-word sequences) against the metadata of the search engine's inverted-file index structures and query logs.

As an exmaple, if I type exmaple code, the engine might do a scan of the most common known words in the corpus, computing each word's edit distance from the term exmaple. It will probably find example and thus suggest, "Did you mean example code".


Similarity is hard to define and IMHO is defined differently in many use cases. Similarity can be phonetically (there are different algorithms like Köllner Verfahren for Germany). In case of phonetically similarity it's a function that calculates the string representation. Then one could use the Levenshtein distance to compare them. I don't know much about (N)Hibernate, but an extension method could be used to calculate the comparison on object base.

-sa


I don't think NHibernate has a functionality which inherently provides you the similar words.

You have to create a distance function which calculates whats the distance between words (how similar they are) and based on a threshold value you can consider all the words that has distance values below that value with respect to your original word.

This distance function is the key, and you can have many criteria based on which you calculate the distance between words

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜