Modelling country adjacency in SQL

2022-12-10 23:32 问答作者：

I'm trying to model which countries border each other in MySQL. I have three tables:

nodes
-----
node_id MEDIUMINT

countries
---------
country_id MEDIUMINT (used as a foreign key for nodes.node_id)
country CHAR(64)
iso_code CHAR(2)

node_adjacency
--------------
node_id_1 MEDIUMINT (used as a foreign key for nodes.node_id)
node_id_2 MEDIUMINT (used as a foreign key for nodes.node_id)

I appreciate the nodes table is redundant in this example, but this is part of a larger architecture where nodes can represent many other items other than countries.

Here's some data (IDs (which appear in all three tables) and countries)

59  Bosnia and Herzegovina
86  Croatia
130 Hungary
178 Montenegro
227 Serbia
232 Slovenia

Croatia is bordered by all the other countries, and this is represented in the node_adjacency table as:

So Serbia's ID may appear as a node_id_1 or a node_id_2. The data in this table is essentially non directed graph data.

Questions:

Given the name 'Croatia', what SQL should I use to retrieve its neighbours?

Bosnia and Herzegovina
Hungary
Montenegro
Serbia
Slovenia

Would there be any retrieval efficiency gains in storing the adjace开发者_JS百科ncy information as directed graph data? E.g. Croatia borders Hungary, and Hungary borders Croatia, essentially duplicating storage of the relationships:

86  130
130 86

This is just off the top of my head, so I don't know if it's the most performant solution and it may need a tweak, but I think it should work:

SELECT
     BORDER.country
FROM
     Countries AS C
LEFT OUTER JOIN Node_Adjacency NA1 ON
     NA1.node_id_1 = C.country_id OR
     NA1.node_id_2 = C.country_id
INNER JOIN Countries AS BORDER ON
     (
     BORDER.country_id = NA1.node_id_1 OR
     BORDER.country_id = NA1.node_id_2
     ) AND
     BORDER.country_id <> C.country_id
 WHERE
     C.country = 'CROATIA'

Since your graph is not directed, I don't think that it makes sense to store it as a directed graph. You might also want to Google "Celko SQL Graph" as he has done a lot of advanced work on trees, graphs, and hierarchies in SQL and has an excellent book devoted to the subject.

I would store both relations (Hungary borders Croatia, Croatia borders Hungary) so that you only ever need to query one column.

SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_1 = c.countryID
WHERE c.countryID = 86

To do both columns, simply union two queries together (borrowing from Matthew Jones):

SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_1 = c.countryID
WHERE c.countryID = 86
UNION
SELECT c.country FROM countries AS c 
INNER JOIN node_adjacency AS n 
ON n.node_id_2 = c.countryID
WHERE c.countryID = 86

If you do it this way, you duplicate your query instead of your data (use 50% less space), and it's still really simple.

You can create a union view to avoid duplication:

CREATE VIEW adjacency_view (node_id_1, node_id_2)
AS
SELECT node_id_1, node_id_2 FROM node_adjacency
UNION ALL
SELECT node_id_2, node_id_1 FROM node_adjacency

So your query becomes quite straightforward:

SELECT c1.country
FROM adjacency_view
INNER JOIN countries AS c1 on c1.country_id = adjacency_view.node_id_1
INNER JOIN countries AS c2 on c2.country_id = adjacency_view.node_id_2
WHERE c2.country = 'CROATIA'

If you are duplicating relationships (i.e. country A shares border with B, and B shares border with A) you can get a way with a simple select. If you store only one relationship per pair of countries you will need to search by both columns in node_adjacency table (running two select statements and performing a union).

继续阅读：graph sql

Modelling country adjacency in SQL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？