Cycle detection with recursive subquery factoring

2022-12-11 04:41 问答作者：

Oracle SQL can do hierarchical queries since v2, using their proprietary CONNECT BY syntax. In their latest 11g release 2, they added recursive subquery factoring, also known as the recursive with clause. This is the ANSI standard, and if I understand correctly, this one has been implemented by other RDBMS vendors as well.

When comparing the connect-by with the recursive with, I noticed a difference in the result set when using cycle detection. The connect by results are more intuitive to me, so I'm wondering if Oracle's implementation contains a bug, or if this is standard ANSI and expected behaviour. Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and开发者_如何转开发 others. Provided those databases support the recursive with clause of course.

Here is how it works on Oracle 11.2.0.1.0

SQL> select *
  2    from t
  3  /

        ID  PARENT_ID
---------- ----------
         1          2
         2          1

2 rows selected.

The query using CONNECT BY syntax:

SQL>  select id
  2        , parent_id
  3        , connect_by_iscycle
  4     from t
  5  connect by nocycle parent_id = prior id
  6    start with id = 1
  7  /

        ID  PARENT_ID CONNECT_BY_ISCYCLE
---------- ---------- ------------------
         1          2                  0
         2          1                  1

2 rows selected.

Which looks intuitive to me. However, using the new ANSI syntax it returns one more row:

SQL> with tr (id,parent_id) as
  2  ( select id
  3         , parent_id
  4      from t
  5     where id = 1
  6     union all
  7    select t.id
  8         , t.parent_id
  9      from t
 10           join tr on t.parent_id = tr.id
 11  ) cycle id set is_cycle to '1' default '0'
 12  select id
 13       , parent_id
 14       , is_cycle
 15    from tr
 16  /

        ID  PARENT_ID I
---------- ---------- -
         1          2 0
         2          1 0
         1          2 1

3 rows selected.

This is the script you can use to check:

create table t
( id        number
, parent_id number
);
insert into t values (1, 2);
insert into t values (2, 1);
commit;
with tr (id,parent_id) as
( select id
       , parent_id
    from t
   where id = 1
   union all
  select t.id
       , t.parent_id
    from t
         join tr on t.parent_id = tr.id
) cycle id set is_cycle to '1' default '0'
select id
     , parent_id
     , is_cycle
  from tr;

From documentation on CONNECT_BY_ISCYCLE:

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor

and that on CYCLE:

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

In your example, row 2 does have a child which is also its ancestor, but its id has not been returned yet.

In other words, CONNECT_BY_ISCYCLE checks the children (which are yet to be returned), while CYCLE checks the current row (which is already returned).

CONNECT BY is row based, while recursive CTE's are set-based.

Note that Oracle's documentation on CYCLE mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursive CTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

Since recursive CTE's are usually used to build hierarchy trees, Oracle decided to add a cycle check. But due the set-based way the recursive CTE's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.

It's not a problem if the current set always consists of a single row (like in CONNECT BY), but it is a problem if the recursive operation defined on a set as a whole.

Didn't look into Oracle 11 yet, but SQL Server implements recursive CTE's by just hiding a CONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.

As was mentioned before, MySQL does not implement CTE's at all (it does not implement HASH JOIN's or MERGE JOINs as well, only the nested loops, so don't be surprised much).

Ironically, I received a letter today on this very subject, which I will cover in my blog.

Update:

Recursive CTE's in SQL Server are no more than CONNECT BY in disguise. See this article in my blog for shocking details:

SQL Server: are the recursive CTE’s really set-based?

PostgreSQL supports WITH-style hierarchical queries, but doesn't have any automatic cycle detection. This means that you need to write your own and the number of rows returned depends on the way you specify join conditions in the recursive part of the query.

Both examples use an array if IDs (called all_ids) to detect loops:

WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)
    FROM t
    JOIN tr ON t.parent_id = tr.id AND NOT cycle)
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | f
  1 |         2 | t


WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))
    FROM t
    JOIN tr ON t.parent_id = tr.id
    WHERE NOT t.id = ANY(all_ids))
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | t

AFAIK:

MySQL doesn't support recursive CTE's
SQL Sever does not support cycle detection in recursive CTE's

Results of the connect by may not always be intuitive.

Below queries demonstrate different approaches to detect cycles starting with id = 3 for the graph on the picture.

create table graph (id, id_parent) as
(select 2, 1 from dual
union all select 3, 1 from dual
union all select 4, 3 from dual
union all select 5, 4 from dual
union all select 3, 5 from dual)

Cycle detection with recursive subquery factoring

SQL> select level lvl, graph.*, connect_by_iscycle cycle
  2    from graph
  3   start with id = 3
  4  connect by nocycle prior id = id_parent;

       LVL         ID  ID_PARENT      CYCLE
---------- ---------- ---------- ----------
         1          3          1          0
         2          4          3          0
         3          5          4          1
         1          3          5          0
         2          4          3          0
         3          5          4          1

6 rows selected.

SQL> select level lvl, graph.*, connect_by_iscycle cycle
  2    from graph
  3   start with id = 3
  4  connect by nocycle prior id = id_parent
  5         and prior id_parent is not null;

       LVL         ID  ID_PARENT      CYCLE
---------- ---------- ---------- ----------
         1          3          1          0
         2          4          3          0
         3          5          4          0
         4          3          5          1
         1          3          5          0
         2          4          3          0
         3          5          4          1

7 rows selected.

SQL> with t(id, id_parent) as
  2   (select *
  3      from graph
  4     where id = 3
  5    union all
  6    select g.id, g.id_parent
  7      from t
  8      join graph g
  9        on t.id = g.id_parent)
 10  search depth first by id set ord
 11  cycle id set cycle to 1 default 0
 12  select * from t;

        ID  ID_PARENT        ORD C
---------- ---------- ---------- -
         3          1          1 0
         4          3          2 0
         5          4          3 0
         3          5          4 1
         3          5          5 0
         4          3          6 0
         5          4          7 0
         3          5          8 1

8 rows selected.

Node with id = 3 has two parents so Oracle traverses two cycles in this example.

(1, 3) -> (3, 4) -> (4, 5) -> (5, 3)

and

(5, 3) -> (3, 4) -> (4, 5)

Edge (5, 3) is missing from the result of the first query and first cycle. At the same time edge (5, 3) appears in the result for the third query and second cycle twice.

Why so? You can check description of the cycle detection logic in the answer provided by Quassnoi. In plain English it means that

(1) connect by detects a cycle if child ID for current row is part of IDs visited so far

(2) rec with detects a cycle if ID for current row is part of IDs visited so far

Result of the second query looks the most natural although there is additional predicate and prior id_parent is not null. In this case

(3) it detects a cycle if ID for current row is part of parent IDs visited so far

All this conditions are implemented in columns cnt1, cnt2, cnt3 in below query.

SQL> with t(id, id_parent, path_id, path_id_parent, cnt1, cnt2, cnt3) as
  2   (select g.*,
  3           cast('->' || g.id as varchar2(4000)),
  4           cast('->' || g.id_parent as varchar2(4000)),
  5           0,
  6           0,
  7           0
  8      from graph g
  9     where id = 3
 10    union all
 11    select g.id,
 12           g.id_parent,
 13           t.path_id || '->' || g.id,
 14           t.path_id_parent || '->' || g.id_parent,
 15           regexp_count(t.path_id || '->', '->' ||
 16            (select id from graph c where c.id_parent = g.id) || '->'),
 17           regexp_count(t.path_id || '->', '->' || g.id || '->'),
 18           regexp_count(t.path_id_parent || '->', '->' || g.id || '->')
 19      from t
 20      join graph g
 21        on t.id = g.id_parent
 22    -- and t.cnt1 = 0
 23    -- and t.cnt2 = 0
 24    -- and t.cnt3 = 0
 25    )
 26  search depth first by id set ord
 27  cycle id set cycle to 1 default 0
 28  select * from t;

        ID  ID_PARENT PATH_ID         PATH_ID_PARENT  CNT1 CNT2 CNT3        ORD C
---------- ---------- --------------- --------------- ---- ---- ---- ---------- -
         3          1 ->3             ->1                0    0    0          1 0
         4          3 ->3->4          ->1->3             0    0    0          2 0
         5          4 ->3->4->5       ->1->3->4          1    0    0          3 0
         3          5 ->3->4->5->3    ->1->3->4->5       1    1    1          4 1
         3          5 ->3             ->5                0    0    0          5 0
         4          3 ->3->4          ->5->3             0    0    0          6 0
         5          4 ->3->4->5       ->5->3->4          1    0    1          7 0
         3          5 ->3->4->5->3    ->5->3->4->5       1    1    1          8 1

8 rows selected.

If you uncomment filter by cnt1/cnt2/cnt3 and remove "cycle id set cycle to 1 default 0" then query will return result as corresponding query above. In other words you can avoid cycle clause and implement whatever cycle detection logic you find more intuitive.

Additional details about traversing hierarchies and cycle detection can be found in the book Oracle SQL Revealed.

MySQL Server version 5.0.45 didn't like with:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'with tr (id, parent_id) as (select id, parent_id from t where id = 1 union all s' at line 1.

WITH RECURSIVE s (master, slave, all_ids, cycle) AS
(
    SELECT master, slave, ARRAY[master], false FROM binding WHERE master=3477

    UNION ALL

    SELECT d.master, d.slave, all_ids || d.master, d.slave = ANY(all_ids)
    FROM
        binding AS d
    JOIN
        s
    ON (d.master = s.slave)
    WHERE NOT d.master = ANY(all_ids)
)
SELECT *
FROM s;

I think better is this condition d.slave = ANY(all_ids)

"Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others"

MariaDB 10.5.2 and newer support cycle detection:

WITH

The CYCLE clause enables CTE cycle detection, avoiding excessive or infinite loops, MariaDB supports a relaxed, non-standard grammar.
WITH RECURSIVE ... (
 ...
)
CYCLE <cycle column list> RESTRICT

Example:

CREATE TABLE t(id INT, parent_id INT);
INSERT INTO t(id, parent_id) VALUES (1, NULL),(2,1),(3,2),(1,3);

WITH RECURSIVE cte AS (
  SELECT id, parent_id, 0 lvl 
  FROM t WHERE parent_id IS NULL
  UNION ALL
  SELECT t.id, t.parent_id, lvl + 1 AS lvl
  FROM cte c1
  JOIN t ON c1.id = t.parent_id
)
CYCLE id, parent_id RESTRICT 
SELECT * FROM cte ORDER BY lvl;

db<>fiddle demo

继续阅读：db2 hierarchical-query oracle sql

Cycle detection with recursive subquery factoring

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？