How to store a tree in SQL database

2023-01-06 02:15 问答作者：

I have to store a tree in a database, so what is the best way to do this? Show the method you use and name its pros开发者_如何学运维 and cons. (I am using SQL Server 2005)

I found the discussion in the SQL Anti-patterns very helpfull, as it also focuses on the drawbacks of every implementation.

Also, The slides 48-77 in this presentation reiterate that analisys.

Bottom line, there is no such thing as a generic tree, and no silver bullet for SQL trees. You'll have to ask yourself about the data, how and how much will they be selected, modified, will branches be moved around, etc, and based on those answers implement a suitable solution.

Well, the easiest way would be for a record to have a ParentID column so that it knows which record is its parent. This is a pretty standard practice. For example, an online store might have a hierarchy of product categories. Each category will have a ParentID. Example: The "Jeans" category in an clothing database might have "Pants" as the parent category. It's a bit harder if you want a record to indicate which are its children, unless you restrict the number of children. If you want a binary tree, you could have LeftChildID and RightChildID columns. If you allow any number of children, you could have a Children column with IDs delimited by commas (such as 1,4,72,19) but that will make querying rather hard. If your database allows for array types in columns, you can probably use an array instead of a delimited string, which would be easy to query - but I'm not sure if MS SQL Server supports that or not.

Other than that, it depends on what kind of data you are modelling, and also what sort of operations you plan to do with this tree.

I have done this in the past by storing data as xml in SQL.

TL;DR: Adjacency List is great for finding one level up or down, and being able to edit nodes in the middle. Path Enumeration allows you to have unlimited depth and traverse quickly, but makes editing nodes in the middle difficult. A Closure Table allows for unlimited depth traversal and also editing middle nodes, but takes more space and is more complex to develop.

There are three basic solutions to store a tree or hierarchal structure in a database.

Adjacency list: store each node's parent
Path enumeration: store the entire path for each node in text
Closure table: create an additional table with all ancestor/descendant relationships

Each solution has pros and cons:

	Adjacency List	Path Enumeration	Closure Table
Storage cost	Low: O(n)	Low: O(n)	High: O(n²)
Find first child	Easy/fast	Easy/fast	Easy/fast
Find all children	Hard/slow	Easy/medium	Easy/fast
Find direct parent	Easy/fast	Easy/fast	Easy/fast
Find whole path	Hard/medium	Easy/fast	Easy/fast
Add children	Easy/fast	Easy/fast	Easy/fast
Modify parent	Easy/fast	Hard/slow	Easy/fast
Contradiction prone?	No	Yes	No
Development complexity	Easy	Easy	Hard

1. Adjacency List

CREATE TABLE family_tree_adjacency_list (
  PersonId int,
  Name varchar(255),
  ParentId int

PersonId	Name	ParentId
1	John	NULL
2	Jane	1
3	Mark	2
4	Mary	2
5	Beth	4

This is the simplest tree structure in SQL. Easy to create, easy to conceptualize. Since each node in a tree has only one parent, you can just store each node's parent and you've stored the whole tree. Any nodes with a NULL parent are the top level nodes. This can be a good structure if you don't need to find the top level ancestors or bottom level descendants of nodes.

Pros:

Only 1 table required, no extra storage needs.
Can easily find any node's direct child or parent (SELECT * FROM family_tree WHERE parent_id = 47).
Adding more children is easy
If you need to modify a node in the middle of the tree to change its parent, no additional changes are needed.

Cons:

You can't find all children at all depths. If you're using trees to categorize, you'll either have to do recursive programming to find all the children, or set a limit on number of total descendants. This is complex and slow.
You can't find the top level parent. If you need to traverse the tree all the way to the top, you'll need to use recursive programming (or set a limit on how many fields deep you can go).

2. Path Enumeration

CREATE TABLE family_tree_path_enumeration (
  PersonId int,
  Name varchar(255),
  Path varchar(255)
)

PersonId	Name	Path
1	John	1
2	Jane	1/2
3	Mark	1/2/3
4	Mary	1/2/4
5	Beth	1/2/4/5

This is a good compromise structure. Don't over-engineer your solution! This solution is easy to implement, easy to conceptualize, and usually decently fast. To find all children of a particular node, you'll need to use LIKE with a % wildcard, which is slower than =, but that's usually fine for most cases. The only real downside to this solution is editing the table. Reorganizing a mid-tree node is a very complex endeavor, so if that's something you do often, don't use path enumeration.

Pros:

Low storage costs still
Fast to search children and parents, even all the way to the top, regardless of how many levels there are.
Easy to add children
Easy to develop

Cons:

Modifying a parent requires finding all children and editing each child path as well, which will be an expensive operation. If this is a common task, don't use Path Enumeration. Messing up could also cause contradictions within the database
Finding all children of a node is slightly slower, since it uses LIKE %. This could make a difference in a HUGE database.

-- find all sub-children of #1
SELECT * from family_tree_path_enumeration WHERE Path like '%1%

3. Closure Table

CREATE TABLE family_tree_closure_table (
  PersonId int,
  Name varchar(255)
)

CREATE TABLE family_tree_relationships (
  AncestorId int,
  DescendantId int,
  Depth int  -- this is a helper field to make our lives easier when finding parents and children
)

PersonId	Name
1	John
2	Jane
3	Mark
4	Mary
5	Beth

AncestorId	DescendantId	Depth
1	2	1
2	3	1
1	3	2
2	4	1
1	4	2
4	5	1
2	4	2
1	4	3

This solution is a bit more complex, but it will be fast even with large datasets, and still allows for modification without running into problems. The downside here is that development is slightly more complex, and it can use exponentially more storage space (although in practice that's not usually the case). This can be a good structure if you need to find the top ancestor and bottom descendant, and also need to edit the tree regularly.

Pros:

Expandable and fast
Can quickly find the entire parent path, or all descendant nodes
Easily add children
Somewhat easily modify a middle node (depending on what you're doing, you might need to modify Depth for descendants)

Cons:

Complex to develop (needs 2 separate tables)
Uses more space (needs 2 separate tables)
You might be over-engineering your solution!

Conclusion

Each table has its tradeoffs. The simple Adjacency List is nice if you want a simple solution that's easy to reorganize and easy to find one level of parents or children with. A Path Enumeration table is better if you need to find the full path (up or down), but don't need to edit nodes in the middle of your table very often. If you need to find all nodes in a category, but also need the ability to edit the nodes in the middle, you'll probably want to use a Closure Table.

继续阅读：hierarchical-data sql sql-server tsql

How to store a tree in SQL database

1. Adjacency List

2. Path Enumeration

3. Closure Table

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

1. Adjacency List

2. Path Enumeration

3. Closure Table

Conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？