开发者

I need some sort of full text search on mysql database

I've stuck with one quite tricky problem.

I have list of products from different warehouses, where each product have: Brand and Model plus some extra details. Model could be quite different from different warehouses for the same product, but Brand is always the same.

All list of products I store in one table, let's say it will be Product table. Then I have another table - Model, with CORRECT Model Name, Brand and additional details like image, description etc. Plus 开发者_开发技巧I have keywords column where I try to add all keywords manually.

And here is the problem, I need to associate each product that I receive from warehouse with one record from my Model table. Right now I'm using full text search in boolean mode, but that's quite painful and does not work very well. I need to do a lot of manual work.

Here are just few examples of names that I have:

  • WINT.SPORT3D
  • WINT.SPORT3D XL
  • WINT.SPORT 3D
  • WINT.SPORT3D MO
  • WINTER SPORT 3D

The correct name for all of these items would be: WINTER SPORT 3D, so they should all be assigned to the same model.

So, is there any way to improve full text search or some other technique to solve my problem?

Database that I'm using is MySQL, I would prefer not to change it.


I'll start by putting together a more formal definition of the tables:

warehouse:
    warehouse_id,
    warehouse_product_id,
    product_brand,
    product_name,
    local_id

Here I'd using local_id as a foreign key to your 'Model' table - but to avoid further confusion, I'll call it 'local'

local:
    id,
    product_brand,
    product_name

It seems like the table you describe as 'product' is redundant.

Obviously until the data is cross referenced, local_id will be null. But after it is populated it won't have to change, and given a warehouse_id, a band and a product, you can find your local descriptor easily:

SELECT local.*
FROM local, warehouse
WHERE local.id=warehouse.local_id
AND warehouse.product_brand=local.product_brand
AND warehouse_id=_____
AND warehouse.product_brand=____
AND warehouse.product_name=____

So all you need to do is populate the links. Soundex is a rather crude tool - a better solution for this would be the Levenstein distance algorithm. There's a mysql implementation here

Given a set of rows in the warehouse table which need to be populated:

SELECT w.*
FROM warehouse w
WHERE w.local_id IS NULL;

...for each row identify the best match as (using the values from the previous query as w.*)....

SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1

But this will find the best match, even if the 2 strings are completely different! Hence....

SELECT local.id
FROM local
WHERE local.product_brand=w.product_brand
AND levenstein(local.product_name, w.product_name)<
    (IF LENGTH(local.product_name)<LENGTH(w.product_name),
          LENGTH(local.product_name), LENGTH(w.product_name))/2
ORDER BY levenstein(local.product_name, w.product_name) ASC
LIMIT 0,1

...requires at least half the string to match.

So this can be implemented in a single update statement:

UPDATE warehouse w
SET local_id=(
   SELECT local.id
   FROM local
   WHERE local.product_brand=w.product_brand
   AND levenstein(local.product_name, w.product_name)<
    (IF LENGTH(local.product_name)<LENGTH(w.product_name),
          LENGTH(local.product_name), LENGTH(w.product_name))/2
   ORDER BY levenstein(local.product_name, w.product_name) ASC
   LIMIT 0,1
 )
 WHERE local_id IS NULL;


Try Soundex. All of your examples resolve to W532 while the last one resolves to W536. So, you could:

  1. Add a column to PRODUCT and MODEL called SoundexValue and calculate the Soundex value for each product and model
  2. Compare the Soundex values in the PRODUCT table to the ones in the Model Table. You may have to use a range (+/- 5) to get a higher rate of matching.
  3. Follow the 80/20 rule. That is, spend 80% of your manual effort on the 20% that don't easily fall out.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜