How to best explain on what fields should a user join on?
I need to explain to somebody how they can determi开发者_开发百科ne what fields from multiple tables/views they should join on. Any suggestions? I know how to do it but am having difficulty trying to explain it.
One of the issues they have is they will take two fields from two tables that are the same (zip code) and join on those, when in reality they should be joining on ID columns. When they choose the wrong column to join on it increases records they receive in return.
Should I work in PK and FK somewhere?
While it is indeed typical to join a PK to an FK any conversation about JOIN
clauses that only revolve around PK's and FK's is fairly limited
For example I had this FROM
clause in a recent SQL answer I gave
FROM
YourTable firstNames
LEFT JOIN YourTable lastNames
ON firstnames.Name = lastNames.Name
AND lastNames.NameType =2
and firstnames.FrequencyPercent < lastNames.FrequencyPercent
The table referenced on each side of the table is the same table (a self join) and it includes three condidtions one of which is an inequality. Furthermore there would never be an FK here because its looking to join on a field, that is by design, not a Candidate Key.
Also you don't have even have to join one table to another. You can join inline queries to each other which of course can't possibly have a Key.
So in order to properly understand JOIN
you just need to understand that it combines the records from two relations (tables, views, inline queries) where some conditions evaluate to true. This means you need to understand boolean logic and the database and the data in the database.
If your user is having a problem with a specific JOIN ask them to SELECT some rows from one table and also the other and then ask them under what conditions would you want to combine the rows.
You don't need to talk in terms of a primary key of a table but you should point to it and explain that it uniquely identifies a given row and that you must join to related tables using it or you could get duplicated results.
Give them examples of joining with it and joining without it.
An ER diagram showing all of the tables they use and their key relationships would help ensure that they always use the correct keys.
It sounds to me like neither you, nor the person you are trying to help understands how this particular database is constructed and perhaps don't really even understand basic database fundamentals, like PK's and FK's. Most often a PK from one table is joined to a FK to another table.
Assuming the database has the proper PK's and FK's in place, it would probably help a great deal to generate an ER diagram. That would make the joining concept much easier to grasp.
Another approach you could take is to find someone who does understand these things and create some views for this person to use. This way he doesn't need to understand how to join the tables together.
A user shouldn't typically be doing joins. A user should have an interface that lets them get the data that they need in the way that they need it. If you don't have the developer resources to do that then you're going to be stuck with this problem of having to teach a user technical details. You also need to be very careful about what kind of damage the user can do. Do they have update rights on the data? I hope they don't accidentally do a DELETE FROM Table
with no WHERE
clause. Even if you restrict their permissions, a poorly written query can crush the database server or block resources causing problems for other users (and more work for you).
If you have no choice, then I think that you need to certainly teach them about primary and foreign keys, even if you don't call them that. Point out that the id on your table (or whatever your PK is) identifies a row. Then explain how the id appears in other tables to show the relationship. For example, "See, in the address table we have a person_id which tells us who that address belongs to."
After that, expect to spend a large portion of your time with that user as they make mistakes or come up with other things that they want to get from the database, but which they can't figure out how to get.
From theory, and ideally, you should define primary keys on all tables, and join tables using a primary key to the matching field or fields (foreign key) in the other table.
Even if you don't define or if they're not defined as primary keys, you need to make sure the fields uniquely identify the records in the table, and that they should be properly indexed.
For example, let's say the 'person' table has a SSN and a driver's license field. The SSN could be considered and flagged as the 'primary key', but if you join that table to a 'drivers' table which might not have the SSN, but does have the driver's license #, you could join them by the driver's license field (even if it's not flagged as primary key), but you need to make sure that the field is properly indexed in both tables.
...explain to somebody how they can determine what fields from multiple tables/views they should join on.
Simply put, look for the columns with values that match between the tables/views. Preferably, match exactly but some massaging might be necessary.
The existence of foreign key constraints would help to know what matches to what, but the constraint might not be directly to the table/view that is to be joined.
The existence of a primary key doesn't mean it is the criteria that is necessary for the query, so I would overlook this detail (depending on the audience).
I would recommend attacking the desired result set by starting with the columns desired, and working back from there. If there's more than one table's columns in the result set, focus on the table whose columns should be returning distinct results first and then gradually add joins, checking the result set between each JOIN addition to confirm the results are still the same. Otherwise, need to review the JOIN or if a JOIN is actually necessary vs IN
or EXISTS
.
I did this when I first started out, it comes from thinking of joins as just linking tables together, so I linked at all possible points.
Once you think of joins as a way to combine AND filter the data it becomes easier to understand them.
Writing out your request as a sentence is helpful too, "I want to see all the times Table A interacted with Table B". Then build a query from that using only the ID, noting that if you wanted to know "All the times Table A was in the same zip code as Table B" then you would join by zip code.
精彩评论