Contiguous sequential numbers in MySQL
I have the following table and data.
CREATE TABLE county_zip_code (
id_county INT UNSIGNED,
from_zip_code INT UNSIGNED,
to_zip_code INT UNSIGNED
);
INSERT INTO county_zip_code
(id_county, from_zip_code, to_zip_code)
VALUES
(12, 2580288, 0),
(12, 2580289, 0),
(12, 2580290, 0),
(12, 2580291, 0),
(12, 2580292, 0),
(15, 8670418, 0),
(15, 8670420, 0),
(15, 8670430, 0),
(16, 7600070, 0),
(16, 7600071, 0),
(16, 7600072, 0),
(16, 7600073, 0)
;
I would like to create a new table called "county_zip_code1" with the开发者_开发百科 contiguous sequences replaced with their endpoints:
id_county | from_zip_code | to_zip_code 12 2580288 2580292 15 8670418 0 15 8670420 0 15 8670430 0 16 7600070 7600073
How can I do this?
I've read over some of the answers to the SO question "Evaluate sequential pairs of rows in SQLite", but this kind of SQL is too advanced for me.
It's a little messy to do in SQL. It might be better to handle with a program.
Step back from SQL and relational databases for a second. To make things simple, start with a set of numbers S. Say you wish to find all contiguous sequences in S; how would you go about it? If you could sort and loop (such as in a program), you'd check whether adjacent values in the sorted sequence differ by 1. If you're limited to set operations, you could perform a similar task by taking adjacent pairs from the cross product:
C = {(n,n+1) : n ∈ S ∧ n+1 ∈ S}
To get the endpoints, you could take the transitive closure of C. However, transitive closures in SQL are inelegant; they require a procedural approach, rather than the declarative approach SQL normally affords.
To find a contiguous sequence in SQL, you can perform a self-join on a column with the next value in the sequence. An inner join will filter out the lone items since they won't have a next value. A MIN
and MAX
will get you the endpoints of the sequence, which also requires grouping by the county ID.
SELECT czc.id_county,
MIN(czc.from_zip_code) AS from_zip_code,
MAX(czc_n.from_zip_code) AS to_zip_code
FROM county_zip_code AS czc
JOIN county_zip_code AS czc_n
ON czc.id_county = czc_n.id_county
AND czc.from_zip_code = czc_n.from_zip_code-1
GROUP BY czc.id_county
Note that this solution doesn't cover all cases. If there are disjoint sequences for a county, this will combine them. Add the following to the sample data:
INSERT INTO county_zip_code
VALUES
(15, 8670424, 0),
(15, 8670425, 0),
(15, 8670426, 0),
(15, 8670450, 0),
(15, 8670451, 0),
;
and the query will result in:
+-----------+---------------+-------------+ | id_county | from_zip_code | to_zip_code | +-----------+---------------+-------------+ | 12 | 2580288 | 2580292 | | 15 | 8670424 | 8670453 | | 16 | 7600070 | 7600073 | +-----------+---------------+-------------+
Getting the lone items is a little more involved. From the set S, you want items that have no previous or next value:
{(n-1, n, n+1) : n ∈ S ∧ n-1 ∉ S ∧ n+1 ∉ S}
In SQL, again you use a self join, but you select the items that have no previous or next value. Here, you need partial (left or right) joins to get those items ("a ∉ A" can be translated to a null value in joined tables).
SELECT czc.id_county,
czc.from_zip_code AS from_zip_code,
NULL AS to_zip_code
FROM county_zip_code AS czc
LEFT JOIN county_zip_code AS czc_p
ON czc.id_county = czc_p.id_county
AND czc.from_zip_code = czc_p.from_zip_code+1
LEFT JOIN county_zip_code AS czc_n
ON czc.id_county = czc_n.id_county
AND czc.from_zip_code = czc_n.from_zip_code-1
WHERE czc_p.from_zip_code IS NULL AND czc_n.from_zip_code IS NULL
Take the union of the two queries and (if desired) sort.
(
SELECT czc.id_county,
MIN(czc.from_zip_code) AS from_zip_code,
MAX(czc_n.from_zip_code) AS to_zip_code
FROM county_zip_code AS czc
JOIN county_zip_code AS czc_n
ON czc.id_county = czc_n.id_county
AND czc.from_zip_code = czc_n.from_zip_code-1
GROUP BY czc.id_county
) UNION (
SELECT czc.id_county,
czc.from_zip_code AS from_zip_code,
NULL AS to_zip_code
FROM county_zip_code AS czc
LEFT JOIN county_zip_code AS czc_p
ON czc.id_county = czc_p.id_county
AND czc.from_zip_code = czc_p.from_zip_code+1
LEFT JOIN county_zip_code AS czc_n
ON czc.id_county = czc_n.id_county
AND czc.from_zip_code = czc_n.from_zip_code-1
WHERE czc_p.from_zip_code IS NULL AND czc_n.from_zip_code IS NULL
)
ORDER BY id_county, from_zip_code
Try this:
CREATE TABLE county_zip_code1 AS
SELECT id_county, MIN(from_zip_code) AS from_zip_code, MAX(from_zip_code) to_zip_code
FROM county_zip_code
GROUP BY id_county
精彩评论