开发者

Modelling country adjacency in SQL

I'm trying to model which countries border each other in MySQL. I have three tables:

nodes
-----
node_id MEDIUMINT

countries
---------
country_id MEDIUMINT (used as a foreign key for nodes.node_id)
country CHAR(64)
iso_code CHAR(2)

node_adjacency
--------------
node_id_1 MEDIUMINT (used as a foreign key for nodes.node_id)
node_id_2 MEDIUMINT (used as a foreign key for nodes.node_id)

I appreciate the nodes table is redundant in this example, but this is part of a larger architecture where nodes can represent many other items other than countries.

Here's some data (IDs (which appear in all three tables) and countries)

59  Bosnia and Herzegovina
86  Croatia
130 Hungary
178 Montenegro
227 Serbia
232 Slovenia

Croatia is bordered by all the other countries, and this is represented in the node_adjacency table as:

59  86
86  130
86  178
86  227
86  232

So Serbia's ID may appear as a node_id_1 or a node_id_2. The data in this table is essentially non directed graph data.

Questions:

Given the name 'Croatia', what SQL should I use to retrieve its neighbours?

Bosnia and Herzegovina
Hungary
Montenegro
Serbia
Slovenia

Would there be any retrieval efficiency gains in storing the adjace开发者_JS百科ncy information as directed graph data? E.g. Croatia borders Hungary, and Hungary borders Croatia, essentially duplicating storage of the relationships:

86  130
130 86


This is just off the top of my head, so I don't know if it's the most performant solution and it may need a tweak, but I think it should work:

SELECT
     BORDER.country
FROM
     Countries AS C
LEFT OUTER JOIN Node_Adjacency NA1 ON
     NA1.node_id_1 = C.country_id OR
     NA1.node_id_2 = C.country_id
INNER JOIN Countries AS BORDER ON
     (
     BORDER.country_id = NA1.node_id_1 OR
     BORDER.country_id = NA1.node_id_2
     ) AND
     BORDER.country_id <> C.country_id
 WHERE
     C.country = 'CROATIA'        

Since your graph is not directed, I don't think that it makes sense to store it as a directed graph. You might also want to Google "Celko SQL Graph" as he has done a lot of advanced work on trees, graphs, and hierarchies in SQL and has an excellent book devoted to the subject.


I would store both relations (Hungary borders Croatia, Croatia borders Hungary) so that you only ever need to query one column.

SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_1 = c.countryID
WHERE c.countryID = 86


To do both columns, simply union two queries together (borrowing from Matthew Jones):

SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_1 = c.countryID
WHERE c.countryID = 86
UNION
SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_2 = c.countryID
WHERE c.countryID = 86

If you do it this way, you duplicate your query instead of your data (use 50% less space), and it's still really simple.


You can create a union view to avoid duplication:

CREATE VIEW adjacency_view (node_id_1, node_id_2)
AS
SELECT node_id_1, node_id_2 FROM node_adjacency
UNION ALL
SELECT node_id_2, node_id_1 FROM node_adjacency

So your query becomes quite straightforward:

SELECT c1.country
FROM adjacency_view
INNER JOIN countries AS c1 on c1.country_id = adjacency_view.node_id_1
INNER JOIN countries AS c2 on c2.country_id = adjacency_view.node_id_2
WHERE c2.country = 'CROATIA'


If you are duplicating relationships (i.e. country A shares border with B, and B shares border with A) you can get a way with a simple select. If you store only one relationship per pair of countries you will need to search by both columns in node_adjacency table (running two select statements and performing a union).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜