Any tips of how to handle hierarchical trees in relational model?
I have a tree structure that can be n-levels deep, without restriction. That means that each node can have another n nodes.
What is the best way to retrieve a tree like that without issuing thousands of queries to the database?
I looked at a few other models, like flat table model, Preorder Tree Traversal Algorithm, and so.
Do you guys have any tips or suggestions of how to implement a efficient tree model? My objective in the rea开发者_JAVA技巧l end is to have one or two queries that would spit the whole tree for me.
With enough processing i can display the tree in dot net, but that would be in client machine, so, not much of a big deal.
Oh just dropping some more info here:
environment: Oracle or PostgreSQL, C# and Winforms.
Thanks for the attention
There is no efficient tree model.
SQL 2008 - hierarchyid. There is a new data type for handling hiearchies, but it gets large over time.
Or: usual. Parent field in table, pointing to the ID of the parent table.
For queries...
- You can do that a little more optimal with better SQL (one statement PER LEVEL, plus use of a temp table). This is the really tricky part here.
- What we did in a CMS where we had the same was that every node knew its parent, the top node (for multiple graphs) and the next higher container node. Some nodes were marked as containers - they did contain sub-items, but these belonged logically to anothe containe (like: website, with folders, then document with resources like images).
I noticed that you listed your DBs as Oracle or PostgreSQL. If you can stick to Oracle, you can use the start with
and connect by
commands to easily do what you need. There are plenty of options that will also gracefully handle if there are loops, or if you need to filter out a branch based on some criteria. I've personally used this in a production system that had both heavy reads and writes without any performance issues.
A quick search shows that you can do something similar (although more complex sql wise) with postgreSQL using the with recursive
command. I've not personally used this method, so I can't give you any more information then that.
Nested Sets - slow for updating but fast for querying.
We used a model with great success where we for each node stored a string containing each node id from top to that node, separated by '.'. Retrieving a tree becomes super efficient, only sort on the string.
This model have a limitation of course as to how deep the hierarchy can go. But this is primarily limited by the max size of the id string column. In SQL Server using the varchar(max) the limit is 2^31-1 bytes which makes for pretty deep hierarchies.
To me this depends on the size of your tree and the type of a query you want to run.
From what I can tell you have two data items. MyId and MyParent. If you created a simply table with just those two things you could ask and answer who are my children and who are my parents.
If you wanted to build a complete tree for intensive analysis, I would query all data and them build the tree myself. The database acting only as information storage at this point.
If my tree was large or took more than a half a second to create I would maintain it on my server and make it accessible with ajax calls to the client. This works best if the tree is static for all clients.
If the tree is dynamic I would insist that the client waits while the data is retrieved from the server and the tree is built locally. This will increase usage speed.
If you provided more information on what you ment when you said "split the tree" I might be able to offer a better opinion. But in general I have not found an ideal tree in a database. However OLAP might have such a tool that I don't know about.
Assuming your only concern is selects and not inserts/updates/deletes, this depends on needs, such as:
- do you need to know what level each node is at?
- do you need to know if each node has any children while rendering it?
- do you need to sort siblings by name?
However, if there really are minimal changes to the tree being done, then it almost doesn't matter what scheme you use, as you can do all the work in the application layer and cache the output.
Edit:
When order matters, I usually go for the materialized path method, and add an additional column SortPath. This way you can get your results sorted by sibling, which is a denormalization that makes rendering the tree in HTML extremely easy, as you can write out the entire tree (or any portion) in exactly the order you receive the records using a single query. This is optimal for speed, and is the easiest way to sort more than one level at a time.
E.g.,
CREATE TABLE [dbo].[MatPath](
[ID] [int] NULL,
[Name] [varchar](50) NULL,
[Path] [varchar](max) NULL,
[SortPath] [varchar](max) NULL
)
insert into MatPath (ID, Name, Path, SortPath) values (1, 'Animal', '1', 'Animal-1')
insert into MatPath (ID, Name, Path, SortPath) values (2, 'Dog', '1.2', 'Animal-1|Dog-2')
insert into MatPath (ID, Name, Path, SortPath) values (3, 'Horse', '1.3', 'Animal-1|Horse-3')
insert into MatPath (ID, Name, Path, SortPath) values (4, 'Beagle', '1.2.4', 'Animal-1|Dog-2|Beagle-4')
insert into MatPath (ID, Name, Path, SortPath) values (5, 'Abyssinian', '1.3.5', 'Animal-1|Horse-3|Abyssinian-5')
insert into MatPath (ID, Name, Path, SortPath) values (6, 'Collie', '1.2.6', 'Animal-1|Dog-2|Collie-6')
select *
from MatPath
order by SortPath
Output:
ID Name Path SortPath
------ --------------- ----------- --------------------------------
1 Animal 1 Animal-1
2 Dog 1.2 Animal-1|Dog-2
4 Beagle 1.2.4 Animal-1|Dog-2|Beagle-4
6 Collie 1.2.6 Animal-1|Dog-2|Collie-6
3 Horse 1.3 Animal-1|Horse-3
5 Abyssinian 1.3.5 Animal-1|Horse-3|Abyssinian-5
(6 row(s) affected)
You can determine the level of each node by counting pipes (or periods) in the application layer, or in SQL using whatever built-int or custom functions that can count occurrences of a string.
Also, you'll notice I append the ID
to the Name
when building SortPath
. This is to ensure that two sibling nodes with the same name will always get returned in the same order.
You could number the nodes from 1..M as you build the tree, and store children references in terms of these indexes, Now just write the tree. You can retrieve all the nodes in a a single read. The numbering scheme contains the tree information,
Joe Celko has a solution for this in his SQL for Smarties book (chapter 29). Inserts, updates, and deletes are a little complicated, but selects are very fast. I have been using it for a few years now and it works very well.
精彩评论