Free text (natural language) query parsing with solr
I'm trying to build a query parsing algorithm 开发者_如何学Pythonfor a local search site that can classify a free text search query (single input text box) into various type of possible searches possible on the site.
For e.g. the user could type chinese restaurants near xyz. How should I go about breaking it down to Cuisine:"chinese", locality:"xyz" given that
- there could be spelling mistakes
- keywords may match in different columns e.g. a restaurant may have "chinese" in its name
This is not really a natural language parsing problem since we're trying to search in a very limited set of posiibilities
My initial thoughts are to dump all values of a particular type into a field from the database and use the users query to match in all those fields. Then based on the score (and a predifined confidence level) divide the query into the 3-4 search fields like name/cuisine/locality.
Is there a better/standard way of doing this.
About spelling mistakes, you have to work with a dictionary/thesaurus. This can be part of your pre-processing and normalization.
About querying in multiple columns you can do; cuisine:chinese OR restaurant_name:chinese
You can boost one of the two: cuisine:chinese^0.8 OR restaurant_name:chinese
精彩评论