开发者

How can I find multiple line matches in SQL?

With the following table

CREATE TABLE T1 (
    A varchar(2),
    B varchar(2)
);

INSERT INTO T1 VALUES 
    ('aa', 'm'), ('aa', 'n'),
    ('bb', 'n'), ('bb', 'o'),
    ('cc', 'n'), ('cc', 'o'),
    ('dd', 'c'), ('dd', 'a'), ('dd', 'r'),
    ('ee', 'a'), ('ee', 'c'), ('ee', 'r')

A   | B
----+----
aa  | m
aa  | n
bb  | n
bb  | o
cc  | n
cc  | o
dd  | c
dd  | a
dd  | r
ee  | a
ee  | c
ee  | r

How can I select and group the values in A that match all the coresponding values in B. For example bb and cc ma开发者_开发百科ke up a group because they both contain 'n' and 'o'.

So the result would be

Group | A
----------
1     | bb
1     | cc
2     | dd
2     | ee


Here's one approach: it first calculates the matching "sets", where a set is a group of two A's which match. Then it calculate the "head", or the lowest A for sets in the same group. Using dense_rank you can number the heads, and then join back on the list of sets to create a list of all set members.

Query at SE Data.

; with  groups  as
        (
        select  distinct A
        from    @t
        )
,       vals as
        (
        select  distinct B
        from    @t
        )
,       sets as
        (
        select  g1.A as g1
        ,       g2.A as g2
        from    groups g1
        join    groups g2
        on      g1.A < g2.A
        cross join
                vals v
        left join
                @t v1
        on      g1.A = v1.A
                and v.B = v1.B
        left join
                @t v2
        on      g2.A = v2.A
                and v.B = v2.B
        group by
                g1.A
        ,       g2.A
        having  count(case when isnull(v1.B,'') <> isnull(v2.B,'') then 1 end) = 0
        )
,       heads as
        (
        select  s1.g1
        ,       s1.g2
        ,       head.head
        from    sets s1
        cross apply
                (
                select  min(g1) as head
                from    sets s2
                where   s1.g2 = s2.g2
                ) head
        )
select  distinct dense_rank() over (order by h.head)
,       g.g
from    (
        select  distinct head
        from    heads
        ) h
left join
        (
        select  g1 as g
        ,       head
        from    heads
        union all
        select  g2
        ,       head
        from    heads
        ) g
on      h.head = g.head


SQL Server 2008 has the EXCEPT and INTERSECT functions, which could be used. This isn't in exactly the format that you wanted and I can't speak to performance against large data sets, but perhaps it will give you a starting point.

SELECT DISTINCT
    T1.A,
    T2.A
FROM
    T1 AS T1
INNER JOIN T1 AS T2 ON T2.A > T1.A
WHERE
    NOT EXISTS
    (
    SELECT
        B
    FROM
        T1 AS T3
    WHERE
        T3.A = T1.A
    EXCEPT
    SELECT
        B
    FROM
        T1 AS T4
    WHERE
        T4.A = T2.A
    ) AND
    NOT EXISTS
    (
    SELECT
        B
    FROM
        T1 AS T3
    WHERE
        T3.A = T2.A
    EXCEPT
    SELECT
        B
    FROM
        T1 AS T4
    WHERE
        T4.A = T1.A
    )

Depending on your data, you could also generate some concatenated strings with delimiters and specific ordering within the string and then compare those.


The relational operator you require is division, popularly known as "the supplier who supplies all parts".

Actually, division comes in about eight flavours and the SQL language has implemented none of them directly. However, they can all be recreated using existing SQL constructs: see this article for the more popular ones. Things to consider include: exact division or with remainder; how to handle an empty divisor.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜