Consolidating / Clustering Terms and phrases
Our application allows a user to enter company names that their organization works with. A current issue is that the way one user inputs the company name va开发者_JAVA技巧ries from user to user. We need to consolidate this data. Are there any proven approaches for tackling this problem?
The problem of data quality is generally referred to as Data Cleansing. There are many methods and tools in this area.
The best for you will depend on the extent of your problem and also on the technologies you use. But if I understand well, the data that are stored are OK, the problem is that user input data to search against with incorrect spelling? In this case fuzzy searching could help.
精彩评论