SQL Joins: Future of the SQL ANSI Standard (where vs join)?
We are developing ETL jobs and our consultant has been using "old style" SQL when joining tables
select a.attr1, b.attr1
from table1 a, table2 b
where a.attr2 = b.attr2
instead of using the inner join clause
select a.attr1, b.attr1
from table1 as a inner join tab开发者_运维问答le2 as b
on a.attr2 = b.attr2
My question is that in the long run, is there a risk for using the old "where join"? How long this kind of joins are supported and kept as ANSI standard? Our platform is SQL Server and my primary cause is that in the future these "where joins" are no longer supported. When this happens, we have to modify all our ETL jobs using "inner join" style of joins.
Rather than worry about some possible risk in the future, why not worry about the risk you face right now?
In addition to Mark's points:
- The code is harder to read (and thus understand the purpose of) when the ON clauses are disconnected (sometimes by many lines) from the joined tables. This increases the likelihood of errors when modifying the code.
- Determining what kind of JOIN is being done is harder - you have to go surfing through the WHERE clause and hope that what you see is right.
- Finding missing JOIN clauses is much harder, increasing the risk of an inadvertent Cartesian join - when you use ANSI syntax, the ON clauses line up nicely, making this trivial.
I doubt that "where joins" would ever be unsupported. It's just not possible to not support them, because they are based on Cartesian products and simple filtering. They actually aren't joins.
But there are many reasons to use the newer join syntax. Among others:
- Readability
- Maintainability
- Easier change to outer joins
There are lots of reasons to avoid implicit joins. The big ones are:
- It can't easily be changed to an outer join.
- It's easier to forget the join condition with an implicit join.
- If you mix both implicit and explicit joins you get problems with confusing precedence. Here's an example from a few hours ago: MySQL Syntax error
I don't think they will be removed any time soon, but there are plenty of other reasons to stop using them.
Both syntaxes are supported by the latest versions of ISO SQL (2003,2008). Using commas to specify a cross join in the FROM clause is perfectly standard SQL and is supported by all SQL produts that I have come across. It seems highly unlikely that it ever would or even could be deprecated or desupported within SQL.
As long as they're not using ***= and =* for their join syntax (which has been deprecated as of SQL Server 2008 R2), I can't see it going away in the long run, but as Mark Byers says, there are plenty of reasons not to do it.
My biggest concern would be that if they're writing joins like this, what else are they doing that's unconventional?
People have had some good points but so far there are two big ones that haven't been mentioned:
Whether or not the old style *= and =* outer joins gave correct results, they also cannot properly denote certain joins. Consider the following query. We want to show all customers who have not placed an order over $100:
SELECT FROM Customer C LEFT JOIN Order O ON C.OrderID = O.OrderID AND O.OrderTotal >= 100.0; WHERE O.OrderID IS NULL;
Try and express that the old way... if you can. Without using derived tables.
To me the great value of using proper join clauses is the separation of standard join conditions (which would be applied in almost every query involving those two tables) from the special filters for this query that will return the desired rows:
SELECT C.FullName, C.CustomerCode, O.OrderDate, O.OrderTotal, OD.ExtendedShippingNotes FROM Customer C INNER JOIN Order O ON C.CustomerID = O.CustomerID INNER JOIN OrderDetail OD ON O.OrderID = OD.OrderID WHERE C.CustomerStatus = 'Preferred' AND O.OrderTotal > 1000.0;
This separation means that the developer looking at the query doesn't have to deal with a bunch of clutter while scanning for this query's distinctives. If he's familiar with the tables he can skip the FROM clause entirely and just read the WHERE clause to get all the information he needs. It's faster. And if you don't care about faster, even when just scanning queries with your eyeballs, I don't want to work with you.
Now for those who think there's something special about the location of everything when using JOIN syntax, you're wrong. The following query works just as fine as the upper one:
SELECT C.FullName, C.CustomerCode, O.OrderDate, O.OrderTotal, OD.ExtendedShippingNotes FROM Customer C CROSS JOIN Order O INNER JOIN OrderDetail OD ON C.CustomerID = O.CustomerID AND C.CustomerStatus = 'Preferred' AND O.OrderTotal > 1000.0 WHERE O.OrderID = OD.OrderID;
This query probably even has the exact same execution plan. Surprised? Don't be. Just like the old-style syntax, the optimizer is the one that's responsible for figuring out how to join your tables based on the conditions you give. It doesn't really matter where the conditions are as long as they don't refer to a table that hasn't been mentioned yet.
So what is the big difference between the two styles? If you think the second mixed-up query above is hard to understand and would be a crazy way to write, then, well, you naturally think the old style of query is lame. Because, frankly, putting all the conditions haphazardly into any old place is disorganized. The organization system of JOINs makes sense. If you are used to the old style and don't really like the new style, that's probably because change is unpleasant (for all of us). But once you use it for a while, I'm sure it will grow on you. At least, if it doesn't, I can't possibly understand why.
It is difficult to argue about elegance or ugliness of a certain syntax construction. You just see it or don't. Comma separated join syntax reflects the fundamental feature of Relational Algebra, which asserts the normal form for select-project-join queries. The only kind of join that escapes it (and therefore, warrants a dedicated syntax) is the outer join. Accidental mistakes of missing equality predicates that render join graph disjoint is just a matter of how sophisticated your front-end SQL programming tool is (does it display join graph at all?).
It is not only aesthetics. It is common for production databases to have columns like CREATED_ON, or COMMENTS across many tables. In this case the NATURAL JOIN syntax is plain dangerous.
As Anthony Molinaro (author of popular "SQL Cookbook") eloquently put it: “Old style is short and sweet and perfect. ANSI dumbed it down, and for people who've been developing for sometime, it's wholly unnecessary".
The right and left join implied syntax *= and =* is deprecated and for good reason, it currently does not return correct results at all times. If they have used this, these MUST be fixed now as they currently put you at risk for incorrect results. This is not an option. The other syntax will continue to work but should be replaced as well for several reasons. First it is very easy to get accidental cross joins which can return bad results or which are fixed by using distinct which can create performance issues.
Another issue is maintenance. If people later add other joins to the query and start mixing implied and explicit joins again you can get wrong results and not even know it. It is VERY VERY bad to leave this kind of crappy code in your codebase. Implied joins are also harder to understand and because they are often written by developers who don't understand joins, they may not be what you need anyway. And if there is a cross join in the query how is the maintainer to know if it was a bug (and accidental cross join) or a deliberate cross join (we do occasionally really need them). I would not accept this code as written. I would insist that the incompetent who wrote it, fix it at no additional charge.
If you are worried they will be removed from the Standard or from SQL products then worry not. It is not likely to happen ever.
This 'old style' of join is just that: merely a question of style. Those who favour it swear there are situations where 'old style' joins are easier to understand. While I'm not entirely convinced myself, one thing is for sure: the old style joins will be encountered from time to time and re-engineering code to suit your own personal style isn't always appropriate. Therefore, get used to the style and learn to work with it.
精彩评论