GROUP BY with aggregate and an INNER JOIN
I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn't work the way I want it:
SELECT *, MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY id_edge
Here is what I get:
|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|2 | 23 | 2 | 23 | 35 | 1 | 9 |
|4 | 24 | 5 | 24 | 46 | 1 开发者_如何学运维 | 9 |
------------------------------------------------------------------
and this is what I would want:
|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|3 | 23 | 9 | 23 | 35 | 1 | 9 |
|5 | 24 | 9 | 24 | 46 | 1 | 9 |
------------------------------------------------------------------
But let me elaborate a bit...
I have a graph, let's say as such:
node1
|
node2
/ \
node3 node4
| |
node5 node6
Therefore I have a table I call tbl_edges like this:
| id | nodeA | node B |
------------------------
| 12 | 1 | 2 |
| 23 | 2 | 3 |
| 24 | 2 | 4 |
| 35 | 3 | 5 |
| 46 | 4 | 6 |
------------------------
Now each edge
has "stop_over
s" at a certain distance (to nodeA
). Therefore I have a table tbl_stopover like this:
| id | edge | dist |
------------------------
| 1 | 12 | 5 |
| 2 | 23 | 2 |
| 3 | 23 | 9 |
| 4 | 24 | 5 |
| 5 | 24 | 9 |
| 6 | 35 | 5 |
| 7 | 46 | 5 |
------------------------
Why this query?
Let's assume I want to calculate the distance between thestop_over
s. Within one edge that is no problem. Across edges it gets more difficult. But if I have two edges that are connected and there is no other connection I can also calculate the distance. Here an example assuming all edges have a length
of 10. :
edge23
has a stop_over(id=3
) at dist=9, edge35
has a stop_over(id=6
) at dist=5. Therefore the distance between these two stop_over
s is:
dist = (length - dist_id3) + dist_id5 = (10-9) + 5
I am not sure if I made my self clear. If this is not understandable, feel free to ask question and I will do my best to make this more understandable.
MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY
or an aggregate function like MAX
. When you do this, you get random (as you said) results for the remaining fields.
In your query you are doing this twice - once in your inner query (id2
is not part of a GROUP BY
or aggregate) and once in the outer.
Prepare for random results!
To fix it, try something like this:
SELECT tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn,
MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id, edges2.id
HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn
The major changes are the explicit field list (note that I removed the id_edge
since you are joining on id1
and already have that field), and addition of additional fields to both the inner and outer GROUP BY
clauses.
If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.
Okay. This seems to be the answer to my question. I will do some further "investigation" though, because I'm not sure if this is reliable. If anybody has some though on this, please leave a comment.
SELECT tbl.id, tbl.dist, tbl.id1, tbl.id2, MAX(dist) maxDist
FROM
(
SELECT tbl_stopover.id,
tbl_stopover.dist,
tbl_conn.id1,
tbl_conn.id2,
tbl_conn.numConn
FROM tbl_stopover
INNER JOIN
(SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
FROM tbl_edges edges1
INNER JOIN tbl_edges edges2
ON edges1.nodeB = edges2.nodeA
GROUP BY edges1.id
HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.dist, tbl_conn.id1
ORDER BY dist DESC) AS tbl
GROUP BY tbl.id1, tbl.id2
Thanks to JNK
(my colleague at work) without whom I wouldn't have gotten this far.
精彩评论