开发者

How to store a tree in SQL database

I have to store a tree in a database, so what is the best way to do this? Show the method you use and name its pros开发者_如何学运维 and cons. (I am using SQL Server 2005)


I found the discussion in the SQL Anti-patterns very helpfull, as it also focuses on the drawbacks of every implementation.

Also, The slides 48-77 in this presentation reiterate that analisys.

Bottom line, there is no such thing as a generic tree, and no silver bullet for SQL trees. You'll have to ask yourself about the data, how and how much will they be selected, modified, will branches be moved around, etc, and based on those answers implement a suitable solution.


Well, the easiest way would be for a record to have a ParentID column so that it knows which record is its parent. This is a pretty standard practice. For example, an online store might have a hierarchy of product categories. Each category will have a ParentID. Example: The "Jeans" category in an clothing database might have "Pants" as the parent category. It's a bit harder if you want a record to indicate which are its children, unless you restrict the number of children. If you want a binary tree, you could have LeftChildID and RightChildID columns. If you allow any number of children, you could have a Children column with IDs delimited by commas (such as 1,4,72,19) but that will make querying rather hard. If your database allows for array types in columns, you can probably use an array instead of a delimited string, which would be easy to query - but I'm not sure if MS SQL Server supports that or not.

Other than that, it depends on what kind of data you are modelling, and also what sort of operations you plan to do with this tree.


I have done this in the past by storing data as xml in SQL.


TL;DR: Adjacency List is great for finding one level up or down, and being able to edit nodes in the middle. Path Enumeration allows you to have unlimited depth and traverse quickly, but makes editing nodes in the middle difficult. A Closure Table allows for unlimited depth traversal and also editing middle nodes, but takes more space and is more complex to develop.


There are three basic solutions to store a tree or hierarchal structure in a database.

  1. Adjacency list: store each node's parent
  2. Path enumeration: store the entire path for each node in text
  3. Closure table: create an additional table with all ancestor/descendant relationships

Each solution has pros and cons:

Adjacency List Path Enumeration Closure Table
Storage cost Low: O(n) Low: O(n) High: O(n2)
Find first child Easy/fast Easy/fast Easy/fast
Find all children Hard/slow Easy/medium Easy/fast
Find direct parent Easy/fast Easy/fast Easy/fast
Find whole path Hard/medium Easy/fast Easy/fast
Add children Easy/fast Easy/fast Easy/fast
Modify parent Easy/fast Hard/slow Easy/fast
Contradiction prone? No Yes No
Development complexity Easy Easy Hard

1. Adjacency List

CREATE TABLE family_tree_adjacency_list (
  PersonId int,
  Name varchar(255),
  ParentId int 
PersonId Name ParentId
1 John NULL
2 Jane 1
3 Mark 2
4 Mary 2
5 Beth 4

This is the simplest tree structure in SQL. Easy to create, easy to conceptualize. Since each node in a tree has only one parent, you can just store each node's parent and you've stored the whole tree. Any nodes with a NULL parent are the top level nodes. This can be a good structure if you don't need to find the top level ancestors or bottom level descendants of nodes.

Pros:

  • Only 1 table required, no extra storage needs.
  • Can easily find any node's direct child or parent (SELECT * FROM family_tree WHERE parent_id = 47).
  • Adding more children is easy
  • If you need to modify a node in the middle of the tree to change its parent, no additional changes are needed.

Cons:

  • You can't find all children at all depths. If you're using trees to categorize, you'll either have to do recursive programming to find all the children, or set a limit on number of total descendants. This is complex and slow.
  • You can't find the top level parent. If you need to traverse the tree all the way to the top, you'll need to use recursive programming (or set a limit on how many fields deep you can go).

2. Path Enumeration

CREATE TABLE family_tree_path_enumeration (
  PersonId int,
  Name varchar(255),
  Path varchar(255)
)
PersonId Name Path
1 John 1
2 Jane 1/2
3 Mark 1/2/3
4 Mary 1/2/4
5 Beth 1/2/4/5

This is a good compromise structure. Don't over-engineer your solution! This solution is easy to implement, easy to conceptualize, and usually decently fast. To find all children of a particular node, you'll need to use LIKE with a % wildcard, which is slower than =, but that's usually fine for most cases. The only real downside to this solution is editing the table. Reorganizing a mid-tree node is a very complex endeavor, so if that's something you do often, don't use path enumeration.

Pros:

  • Low storage costs still
  • Fast to search children and parents, even all the way to the top, regardless of how many levels there are.
  • Easy to add children
  • Easy to develop

Cons:

  • Modifying a parent requires finding all children and editing each child path as well, which will be an expensive operation. If this is a common task, don't use Path Enumeration. Messing up could also cause contradictions within the database
  • Finding all children of a node is slightly slower, since it uses LIKE %. This could make a difference in a HUGE database.
-- find all sub-children of #1
SELECT * from family_tree_path_enumeration WHERE Path like '%1%

3. Closure Table

CREATE TABLE family_tree_closure_table (
  PersonId int,
  Name varchar(255)
)

CREATE TABLE family_tree_relationships (
  AncestorId int,
  DescendantId int,
  Depth int  -- this is a helper field to make our lives easier when finding parents and children
)
PersonId Name
1 John
2 Jane
3 Mark
4 Mary
5 Beth
AncestorId DescendantId Depth
1 2 1
2 3 1
1 3 2
2 4 1
1 4 2
4 5 1
2 4 2
1 4 3

This solution is a bit more complex, but it will be fast even with large datasets, and still allows for modification without running into problems. The downside here is that development is slightly more complex, and it can use exponentially more storage space (although in practice that's not usually the case). This can be a good structure if you need to find the top ancestor and bottom descendant, and also need to edit the tree regularly.

Pros:

  • Expandable and fast
  • Can quickly find the entire parent path, or all descendant nodes
  • Easily add children
  • Somewhat easily modify a middle node (depending on what you're doing, you might need to modify Depth for descendants)

Cons:

  • Complex to develop (needs 2 separate tables)
  • Uses more space (needs 2 separate tables)
  • You might be over-engineering your solution!

Conclusion

Each table has its tradeoffs. The simple Adjacency List is nice if you want a simple solution that's easy to reorganize and easy to find one level of parents or children with. A Path Enumeration table is better if you need to find the full path (up or down), but don't need to edit nodes in the middle of your table very often. If you need to find all nodes in a category, but also need the ability to edit the nodes in the middle, you'll probably want to use a Closure Table.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜