Recursive subdirectory SQL problem

2022-12-21 17:26 问答作者：

This is a mental excercise that has been bothering me for a while. What strategy would you use to solve this sort of problem?

Let's consider the following simple database structure. We have directories, obviously a tree of them. Also we have content items, which always reside in some directories.

create table directory ( 
 directoryId integer generated always as identity primary key,
 parentId integer default null,
 directoryName varchar(100)
);

create table content (
 contentId integer generated always as identity primary key,
 directory integer references directory(directoryId),
 contentTitle varchar(100),
 contentText varchar(32000)
);

Now let's assume that our directory tree is massive and the amount of content is massive. The solution must scale well.

The main problem: How to efficiently retrieve all content items that are found from the specified directory and its subdirectories?

The way I see it SQL can not be used to get easily all the directoryIds for a subselect. Am I correct?

One could solve this at application side with simple recursive loop. That might become actually very heavy though and require tricky caching, especially to quarantee reasonable first access times.

One could also perhaps build a materialized query table and add multi-dimensional indexes dynamically for it. Possible but an implementation mess. Too complex.

My far most favorite solution would be prob开发者_StackOverflow中文版ably to add a new table like

create table subdirectories (
 directoryId integer,
 subdirectoryId integer,
 constraint thekey primary key (directoryId,subdirectoryId)
)

and make sure I would always update it manually when directories are being moved/deleted/created. Thus I could always do a select with the directoryId and get all Ids for subdirectories, including as a subselect for more complex queries. I also like the fact that the rdbms is able to optimize the queries well.

What do you guys think?

In SQL Server 2005, PostgreSQL 8.4 and Oracle 11g:

WITH    
        -- uncomment the next line in PostgreSQL
        -- RECURSIVE
        q AS
        (
        SELECT  directoryId
        FROM    directories
        WHERE   directoryId = 1
        UNION ALL
        SELECT  d.directoryId 
        FROM    q
        JOIN    directories
        WHERE   parentId = q.directoryId
        )
SELECT  c.*
FROM    q
JOIN    content c
ON      c.directory = q.directoryId

In Oracle before 11g:

SELECT  c.*
FROM    (
        SELECT  directoryId
        FROM    directories
        START WITH
                directoryId = 1
        CONNECT BY
                parent = PRIOR directoryID
        ) q
JOIN    content c
ON      c.directory = q.directoryId

For PostgreSQL 8.3 and below see this article:

Hierarchical queries in PostgreSQL

For MySQL, see this article:

Hierarchical queries in MySQL

This is a standard -- and well understood -- "hard problem" in SQL.

All arc-node graph theory problems are hard because they involve transitive relationships.

There are standard solutions.

Loops with an explicit stack using to manage a list of unvisited nodes of the tree.
Recursion. This is remarkably efficient. It doesn't "require tricky caching" It's really simple and really effective. The recursion stack is the list of unvisited nodes.
Creating a "transitive closure" of the directory tree.
SQL extensions to handle transitive relationships like a directory tree.

继续阅读：model recursion sql subdirectory

Recursive subdirectory SQL problem

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？