开发者

Contiguous sequential numbers in MySQL

I have the following table and data.

CREATE TABLE county_zip_code (
  id_county INT UNSIGNED,
  from_zip_code INT UNSIGNED,
  to_zip_code INT UNSIGNED
);

INSERT INTO county_zip_code 
(id_county, from_zip_code, to_zip_code)
    VALUES
(12,          2580288,          0),
(12,          2580289,          0),
(12,          2580290,          0),
(12,          2580291,          0),
(12,          2580292,          0),
(15,          8670418,          0),
(15,          8670420,          0),
(15,          8670430,          0),
(16,          7600070,          0),
(16,          7600071,          0),
(16,          7600072,          0),
(16,          7600073,          0)
;

I would like to create a new table called "county_zip_code1" with the开发者_开发百科 contiguous sequences replaced with their endpoints:

id_county | from_zip_code | to_zip_code
    12      2580288         2580292
    15      8670418         0
    15      8670420         0
    15      8670430         0
    16      7600070         7600073

How can I do this?

I've read over some of the answers to the SO question "Evaluate sequential pairs of rows in SQLite", but this kind of SQL is too advanced for me.


It's a little messy to do in SQL. It might be better to handle with a program.

Step back from SQL and relational databases for a second. To make things simple, start with a set of numbers S. Say you wish to find all contiguous sequences in S; how would you go about it? If you could sort and loop (such as in a program), you'd check whether adjacent values in the sorted sequence differ by 1. If you're limited to set operations, you could perform a similar task by taking adjacent pairs from the cross product:

C = {(n,n+1) : n ∈ S ∧ n+1 ∈ S}

To get the endpoints, you could take the transitive closure of C. However, transitive closures in SQL are inelegant; they require a procedural approach, rather than the declarative approach SQL normally affords.

To find a contiguous sequence in SQL, you can perform a self-join on a column with the next value in the sequence. An inner join will filter out the lone items since they won't have a next value. A MIN and MAX will get you the endpoints of the sequence, which also requires grouping by the county ID.

SELECT czc.id_county, 
       MIN(czc.from_zip_code) AS from_zip_code,
       MAX(czc_n.from_zip_code) AS to_zip_code
  FROM county_zip_code AS czc
    JOIN county_zip_code AS czc_n 
      ON     czc.id_county = czc_n.id_county 
         AND czc.from_zip_code = czc_n.from_zip_code-1
  GROUP BY czc.id_county

Note that this solution doesn't cover all cases. If there are disjoint sequences for a county, this will combine them. Add the following to the sample data:

INSERT INTO county_zip_code
   VALUES
(15, 8670424, 0),
(15, 8670425, 0),
(15, 8670426, 0),

(15, 8670450, 0),
(15, 8670451, 0),
;

and the query will result in:

+-----------+---------------+-------------+
| id_county | from_zip_code | to_zip_code |
+-----------+---------------+-------------+
|        12 |       2580288 |     2580292 |
|        15 |       8670424 |     8670453 |
|        16 |       7600070 |     7600073 |
+-----------+---------------+-------------+

Getting the lone items is a little more involved. From the set S, you want items that have no previous or next value:

{(n-1, n, n+1) : n ∈ S ∧ n-1 ∉ S ∧ n+1 ∉ S}

In SQL, again you use a self join, but you select the items that have no previous or next value. Here, you need partial (left or right) joins to get those items ("a ∉ A" can be translated to a null value in joined tables).

SELECT czc.id_county, 
       czc.from_zip_code AS from_zip_code,
       NULL AS to_zip_code
  FROM county_zip_code AS czc
    LEFT JOIN county_zip_code AS czc_p
      ON     czc.id_county = czc_p.id_county 
         AND czc.from_zip_code = czc_p.from_zip_code+1
    LEFT JOIN county_zip_code AS czc_n
      ON     czc.id_county = czc_n.id_county 
         AND czc.from_zip_code = czc_n.from_zip_code-1
WHERE czc_p.from_zip_code IS NULL AND czc_n.from_zip_code IS NULL

Take the union of the two queries and (if desired) sort.

(
  SELECT czc.id_county, 
         MIN(czc.from_zip_code) AS from_zip_code,
         MAX(czc_n.from_zip_code) AS to_zip_code
    FROM county_zip_code AS czc
      JOIN county_zip_code AS czc_n 
        ON     czc.id_county = czc_n.id_county 
           AND czc.from_zip_code = czc_n.from_zip_code-1
    GROUP BY czc.id_county
) UNION (
  SELECT czc.id_county, 
         czc.from_zip_code AS from_zip_code,
         NULL AS to_zip_code
    FROM county_zip_code AS czc
      LEFT JOIN county_zip_code AS czc_p
        ON     czc.id_county = czc_p.id_county 
           AND czc.from_zip_code = czc_p.from_zip_code+1
      LEFT JOIN county_zip_code AS czc_n
        ON     czc.id_county = czc_n.id_county 
           AND czc.from_zip_code = czc_n.from_zip_code-1
  WHERE czc_p.from_zip_code IS NULL AND czc_n.from_zip_code IS NULL
)
ORDER BY id_county, from_zip_code


Try this:

CREATE TABLE county_zip_code1 AS
SELECT id_county, MIN(from_zip_code) AS from_zip_code, MAX(from_zip_code) to_zip_code
  FROM county_zip_code
GROUP BY id_county
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜