What is the most space efficient way to store an N-ary tree while preserving hierarchy traversal?

2023-04-01 13:13 问答作者：

I read this paper.

But I'd love to avoid a ton of research to solve this problem if someone has already done it. I need this space efficient tree for a reasonably (conceptually) simple GUI control: TreeDataGridView with virtual mode

An example tree might look like:

(RowIndexHierarchy),ColumnIndex
(0),0                          
  (0,0),0                      
    (0,0,0),0                  
      (0,0,0,0),0              
        (0,0,0,0,0),0          
      (0,0,0,1),0              
      (0,0,0,2),0              
    (0,0,1),0                  
    (0,0,2),0                  
      (0,0,2,0),0              
      (0,0,2,1),0              
      (0,0,2,2),0              
  (0,1),0                      
  (0,2),0                      
    (0,2,0),0                  
      (0,2,0,0),0              
      (0,2,0,1),0              
      (0,2,0,2),0              
    (0,2,1),0                  
    (0,2,2),0                  
    (0,2,2,0),0                
      (0,2,2,1),0              
      (0,2,2,2),开发者_如何学运维0              
(1),0

I need operations like "find flat row index from row hierarchy" and "find row hierarchy from flat row index". Also, to support expand/collapse, I need "find next node with the same or less depth".

For read-only tree you can store sorted array of nodes by its parent index.

0 a
1 (a/)b
1 (a/)c
2 (a/b/)d
2 (a/b/)e
2 (a/b/)f
3 (a/c/)c

Each time you'll need to find child nodes you can use binary search to find upper and lower boundaries of nodes range.

I'm not sure if I am following your needs exactly, but we access a database that has a tree-view UI. The tree runs from

Top Level (the user's company); Direct Client Company; Office Location; Employee; Indirect Client Company; Proposal; Specific Vendor Bid; Detail Financials (invoices, adjustments, etc)

As we have thousands of direct clients and the tree branches pretty heavily at each tier, so we don't load the entire data-set at any time. Instead we only load Type, Guid, DisplayName and some administrative data for each child and load a "details" pane for the currently focused item. Unexplored paths through the tree simply don't exist in memory. To avoid loading the full lists of names, we have "rolodex" level that just divides the dataset into batches of 100 records or less. (So "Z" stands alone, but "Sa-St" subdivides S). These auto-insert when a subset grows beyond the 100 record threshold.

Periodically (when the system is idle) we check the loaded count and if it exceeds a threshold we drop the least recently used nodes until we are below the threshold.

The actual data access is done when the user navigates: we access the database and refresh the subset they are navigating through. We do have the advantage that Type determines the table to query (both for that level and the children) and thus we can keep the individual record types indexed and accessible.

This has given the user the ability to navigate through the data in any way they want, while keeping the in-memory retained data minimized. Of course we off standard search modes and a menu of "recently used history" (for each type) as well, but often the work they do requires moving up and down a narrow chain of nodes, so the tree view keeps it all in front of them while working with a given client and the subset.

With that background, I become curious as to the nature of data that would have such undifferentiated levels that such a tier by tier data store wouldn't be appropriate. The advantage that tier by tier storage has is that all I need is the current node's Guid and I can search the child table on that as the foreign key (which is indexed, so quick to return the subset).

I guess this amounts to "unasking the question", but it seems that most tree structures have distinct data at each level, so it would seem far easier to work with something established (like a table query on an indexed field, which keeps the whole affair out of memory in the first place) than making a custom structure.

For example, I have never asked for the "next node at the current level" except within a given parent (because leaving a given parent takes me to another context). But within a parent I already have the children and their order.

Perhaps it is because of the space I'm in, but I find a tree control that knows how to bind to different tables based on parent->child relationships of tables more useful, which is why I wrote one. I also find lazy loading of data and aggressive dismissal of data keep the memory footprint minimized. Finally I find the programming model incredibly simple: just create a new "treenode" subclass for any table I want to access and make the treenode responsible for loading their children.

(Clarifying, due to question below:)

In my implementation each TreeNode is actually a SpecificTreeNode, derived from BaseNode, which in turn is derived from TreeNode. Being inherited from TreeNode, they can be used directly by the tree, but because they have overrides of the BaseNode properties such as LoadChildren and display properties, the display and retrieval any given set of data is implied by the type of node (and the Guid that the item represents).

This means that as the user navigates the tree, the SpecificTreeNode generates the required ORM query on the fly. For performance, child tables have any parent IDs as indexes, so navigating down the tree (even by multiple layers, if using a SpecificTreeNode that does rollups) is just a quick index lookup.

In this way, I keep very little of the data in memory at any time, pulling only what we need from the database. Likewise, queries against the tree are converted to ORM queries against our database, pulling only the results and limiting the amount any query can pull (if you are using a Tree UI and you pull over 100 records at once, the UI isn't really the optimal place for whatever you are doing).

When your dataset is hundreds of GB in size, it seems the only reasonable recourse. The advantage I feel this has is that the Tree itself has no idea that different levels and paths render and query differently... it just asks the BaseNode (from its perspective) to do something, and the overrides on SpecificTreeNode actually do the lifting. Thus, the "data structure" is simply the way the tree works already combined with data queries on my tables and views.

(End of clarification.)

Meanwhile all the tree controls on the market seem to miss that and go with something far more complex.

The most space-efficient way to store a balanced N-ary tree is in an array... zero space-overhead! And actually very efficient to traverse ... just some simple math required to compute your parent index from your index... or your N children's indices from your index.

To find some code for this, look up heapsort... the heap structure (nothing to do with memory heap) is a balanced binary tree... and most people implement it in an array. Although it is binary, the same thing can be done for N-ary.

If your N-ary tree is not kept balanced, but tends to be fairly dense, then the array implementation will still be more space-efficient than most... the empty nodes being the only space overhead. However, if your trees are ever highly imbalanced, then the array implementation can be very space-inefficient.

继续阅读：data-structures optimization tree

What is the most space efficient way to store an N-ary tree while preserving hierarchy traversal?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

抽烟只抽炫赫门？

性激素六项检查的最佳时间是多久？多少钱？？

Infinite gtk warnings when I right click on the icon