Finding unused join in a SQL query
I am currently maintaining a significant number of SQL queries. Some of them are created by copy/paste operations, then removing unnecessary fields and sometimes forgetting to remove the tables where these fields come from.
I am l开发者_开发百科ooking for a tool (or anything apart from eyes+brain) that, given a SQL query, would analyze which of the joined tables have no field selected in the SELECT part.
Do you know of such a tool?
Thank you
Hypothetically a tool could exist but it would only be guaranteed to be correct if all the following criteria where met for said join
- Its A LEFT or OUTER JOIN or an INNER JOIN where the cardinality where known to be 1-1 And...
- Its Not Referenced in a SELECT, HAVING, GROUP BY or WHERE and...
- It is not a JOIN to function that has a side effect...
Probably why there's no deterministic warnings in SQL parsers the way there is for let's say an unused variable in C#. But it might be worth while to create a SQL checker that looks for some of these conditions and lets the user know that there's a possibility for optimization here.
Just because no fields are referenced in the SELECT, that does not mean that the join is not important to the logic of the query and results could change if the join is removed.
Consider this simple example: Return the name of all customers who purchased an item in 2011.
SELECT DISTINCT c.CustomerName
FROM Customer c
INNER JOIN Sales s
ON c.CustomerID = s.CustomerID
AND s.SalesDate >= '2011-01-01'
No columns from the Sales table are returned in the SELECT, yet the join is critical to returning the correct result set.
Bottom line: I think you'll need a human eye/brain code review to clean things up properly.
below function replaces all select fields with count(*) and 2nd part removes unnecessary joins. This function works only with tables that has aliases and should be tested for very complex queries and wont work if there is inner queries in join condition.
function sql_query_count($sql) {
//replace select fields with count(*)
$a = true;
$b = 0;
$first_select = stripos($sql, 'select ');
$last_from = 0;
$i = 0;
while($a){
$i++;
$b = stripos($sql, ' from ',$last_from);
$c = strripos(substr($sql, $last_from, $b), 'select ');
if ($c == $first_select || $c === false || $i>100) $a = false;
$last_from = $b+6;
}
if (stripos($sql, 'order by') !== false)
$sql = substr($sql, 0, stripos($sql, 'order by'));
$sql1 = 'select count(*) as c ' . substr($sql, $b);
//remove unnecessary joins
$joins = preg_split("/ join /i", $sql1);
$join_count = count($joins);
$join_type = '';
if (count($joins)>1){
for ($index = 0; $index < $join_count+2; $index++) {
$sql_new = '';
$where = '';
$i = 0;
foreach ($joins as $key => $value) { $i++;
$parts = preg_split("/ where /i", trim($value));
$value = $parts[0];
unset($parts[0]);
$where = implode(' where ', $parts);
$occurence_count = 0;
if ($i > 1) {
$a = explode(' on ', $value);
$c = preg_replace('!\s+!', ' ', trim($a[0]));
$c = explode(' ', $c);
$occurence_count = substr_count($sql1, ' '.$c[1].'.')+substr_count($sql1, '='.$c[1].'.');
}
$t = explode(' ', $value);
$j = '';
if (trim(strtolower($t[count($t) - 1])) == 'inner'){
$j = 'inner';
unset($t[count($t) - 1]);
} else if (trim(strtolower($t[count($t) - 2])).' '.trim(strtolower($t[count($t) - 1])) == 'left outer'){
$j = 'left outer';
unset($t[count($t) - 1]);
unset($t[count($t) - 1]);
}
if ($occurence_count == 0 || $occurence_count > 1) $sql_new.= ' '.$join_type.(($join_type!='')?' join ':'').implode(' ', $t);
$join_type = $j;
}
$sql_new .= ' where '.$where;
$sql1 = $sql_new;
$joins = preg_split("/ join /i", $sql1);
}
}
return $sql1;
}
As mentioned above, identifying redundant INNER JOINs will be a problem, as sometimes they have an impact on the returned data, even if no data is actually selected from those tables.
Said that, identifying redundant LEFT JOINs is possible. I'm using this automatic query optimizer to optimize SQL queries automatically. Among other things, it can identify redundant left joins.
精彩评论