Creating a flattened table/view of a hierarchically-defined set of data

2023-01-09 18:26 问答作者：

I have a table containing hierarchical data. There are currently ~8 levels in this hierarchy.

I really like the way the data is structured, but performance is dismal when I need to know if a record at level 8 is a child of a record at level 1.

I have PL/SQL stored functions which do these lookups for me, each having a select * from tbl start with ... connect by... statement. This works fine when I'm querying a handful of records, but I'm in a situation now where I need to query ~10k records at once and for each of them run this function. It's taking 2-3 minutes where I need it to run in just a few seconds.

Using some heuristics based on my knowledge of the current data, I can get rid of the lookup function and just do childrecord.key || '%' LIKE parentrecord.key but that's a really dirty hack and will not always work.

So now I'm thinking that for this hierarchically-defined table I need to have a separate parent-child table, which will contain every relationship...for a hierarchy going from level 1-8 there would be 8! records, associating 1 with 2, 1 with 3,...,1 with 8 and 2 with 3, 2 with 4,...,2 with 8. And so forth.

My thought is that I would need to have an insert trigger where it will basically run the connect by query and for every match going up the hierarchy i开发者_运维百科t will insert a record in the lookup table. And to deal with old data I'll just set up foreign keys to the main table with cascading deletes.

Are there better options than this? Am I missing another way that I could determine these distant ancestor/descendant relationships more quickly?

EDIT: This appears to be exactly what I'm thinking about: http://evolt.org/working_with_hierarchical_data_in_sql_using_ancestor_tables

So what you want is to materialize the transitive closures. That is, given this application table ...

 ID   | PARENT_ID
------+----------
    1 | 
    2 |         1
    3 |         2
    4 |         2
    5 |         4

... the graph table would look like this:

 PARENT_ID | CHILD_ID
-----------+----------
         1 |        2
         1 |        3
         1 |        4
         1 |        5
         2 |        3
         2 |        4
         2 |        5
         4 |        5

It is possible to maintain a table like this in Oracle, although you will need to roll your own framework for it. The question is whether it is worth the overhead. If the source table is volatile then keeping the graph data fresh may cost more cycles than you will save on the queries. Only you know your data's profile.

I don't think you can maintain such a graph table with CONNECT BY queries and cascading foreign keys. Too much indirect activity, too hard to get right. Also a materialized view is out, because we cannot write a SQL query which will zap the 1->5 record when we delete the source record for ID=4.

So what I suggest you read a paper called Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su and Wong. This contains a lot of theory and some gnarly (Oracle) SQL but it will give you the grounding to build the PL/SQL you need to maintain a graph table.

"can you expand on the part about it being too difficult to maintain with CONNECT BY/cascading FKs? If I control access to the table and all inserts/updates/deletes take place via stored procedures, what kinds of scenarios are there where this would break down?"

Consider the record 1->5 which is a short-circuit of 1->2->4->5. Now what happens if, as I said before, we delete the the source record for ID=4? Cascading foreign keys could delete the entries for 2->4 and 4->5. But that leaves 1->5 (and indeed 2->5) in the graph table although they no longer represent a valid edge in the graph.

What might work (I think, I haven't done it) would be to use an additional synthetic key in the source table, like this.

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
    1 |           | AAA
    2 |         1 | BBB
    3 |         2 | CCC
    4 |         2 | DDD
    5 |         4 | EEE

Now the graph table would look like this:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
         1 |        2 | BBB
         1 |        3 | CCC
         1 |        4 | DDD
         1 |        5 | DDD
         2 |        3 | CCC
         2 |        4 | DDD
         2 |        5 | DDD
         4 |        5 | DDD

So the graph table has a foreign key referencing the relationship in the source table which generated it, rather than linking to the ID. Then deleting the record for ID=4 would cascade deletes of all records in the graph table where NEW_KEY=DDD.

This would work if any given ID can only have zero or one parent IDs. But it won't work if it is permissible for this to happen:

 ID   | PARENT_ID
------+----------
    5 |         2
    5 |         4

In other words the edge 1->5 represents both 1->2->4->5 and 1->2->5. So, what might work depends on the complexity of your data.

继续阅读：database-design hierarchical-data oracle query-optimization sql

Creating a flattened table/view of a hierarchically-defined set of data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？