开发者

How to avoid Cartesian product in an INNER JOIN query?

I have 6 tables, let's call them a,b,c,d,e,f. Now I want to search all the colums (except the ID columns) of all tables for a certain word, let's say 'Joe'. What I did was, I made INNER JOINS over all the tables and then used LIKE to search the columns.

INNER JOIN
...
ON
INNER JOIN
...
ON.......etc.
WHERE a.firstname 
~* 'Joe' 
OR a.lastname 
~* 'Joe' 
OR b.favorite_food 
~* 'Joe'
OR c.job
~* 'Joe'.......etc.

The results are correct, I get all the colums I was looking for. But I also get some kind of cartesian product, I get 2 or more lines with almost the same results.

How can i avoid this? I want so have each line only once, since the results should appear on a web search.

UPDATE

I first tried to figure out if the SELECT DISTINCT thing would work by using this 开发者_C百科statement: pastie.org/970959 But it still gives me a cartesian product. What's wrong with this?


try SELECT DISTINCT?


On what condition do you JOIN this tables? Do you have foreign keys or something?

Maybe you should find that word on each table separately?


What kind of server are you using? Microsoft SQL Server has a full-text index feature (I think others have something like this too) which lets you search for keywords in a much less resource-intensive way.

Also consider using UNION instead of joining the tables.


Without seeing your tables, I can only really assume what's going on here is you have a one-to-many relationship somewhere. You probably want to do everything in a subquery, select out the distinct IDs, then get the data you want to display by ID. Something like:

SELECT a.*, b.*
FROM (SELECT DISTINCT a.ID
      FROM ...
      INNER JOIN ...
      INNER JOIN ...
      WHERE ...) x
INNER JOIN a ON x.ID = a.ID
INNER JOIN b ON x.ID = b.ID

A couple of things to note, however:

  • This is going to be sloooow and you probably want to use full-text search instead (if your RDBMS supports it).

  • It may be faster to search each table separately rather than to join everything in a Cartesian product first and then filter with ORs.


If your tables are entity type tables, for example a being persons and b being companies, I don't think you can avoid a cartesian product if you search for the results in this way (single query).

You say you want to search all the tables for a certain word, but you probably want to separate the results into the corresponding types. Right? Otherwise a web search would not make much sense. So if you seach for 'Joe', you want to see persons containing the name 'Joe' and for example the company named 'Joe's gym'. Since you are searching for different entities so you should split the search into different queries.

If you really want to do this in one query, you will have to change your database structure to accommodate. You will need some form of 'search table' containing an entity ID (PK) and entity type, and a list of keywords you want that entity to be found with. For example:

EntityType, EntityID, Keywords
------------------------------
Person,     4,        'Joe', 'Doe'
Company,    12,       'Joe''s Gym', 'Gym'

Something like that?

However it's different when your search returns only one type of entity, say a Person, and you want to return the Persons for which you get a hit on that keyword (in any related table to that Person). Then you will need to select all the fields you want to show and group by them, leaving out the fields in which you are searching. Including them inevitably leads to a cartesian product.

I'm just brainstorming here, by the way. It hope it's helpful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜