How do I create a period date range from a mysql table grouping every common sequence of value in a column
My goal is to return a start and end date having same value in a column. Here is my table. The (*) have been marked to give you the idea of how I want to get "EndDate" for every similar sequence value of A & B columns
ID | DayDate | A | B
-----------------------------------------------
1 | 2010/07/1 | 200 | 300
2 | 2010/07/2 | 200 | 300 *
3 | 2010/07/3 | 150 | 250
4 | 2010/07/4 | 150 | 250 *
8 | 2010/07/5 | 150 | 350 *
9 | 2010/07/6 | 200 | 300
10 | 2010/07/7 | 200 | 300 *
11 | 2010/07/8 | 100 | 200
12 | 2010/07/9 | 100 | 200 *
and I want to get the following result table from the above table
| DayDate |EndDate | A | B
-----------------------------------------------
| 2010/07/1 |2010/07/2 | 200 | 300
| 2010/07/3 |2010/07/4 | 150 | 250
| 2010/07/5 |2010/07/5 | 150 | 350
| 2010/07/6 |2010/07/7 | 200 | 300
| 2010/07/8 |2010/07/9 | 100 | 200
UPDATE:
Thanks Mike, The approach of yours seems to work in your perspective of considering the following row as a mistake.
8 | 2010/07/5 | 150 | 350 *
However it is not a mistake. The challenge I am faced with this type of data is like a scenario of logging a market price change with date. The real problem in mycase is to select all rows with the beginning and ending date if both A & B matches in all these rows. Also to select the rows which are next to previously selected, and so on like that no data is left out in the table.
I can explain a real world scenario. A Hotel with Room A and B has room rates for each day entered in to table as explained in my question. Now the hotel needs to get a report to show the price calendar in a shorter way using start and end date, instead of listing all the dates entered. For example, on 2010/07/01 to 2010/07/02 the price of A is 200 and B is 300. This price is changed from 3rd to 4th and on 5th there开发者_如何学Go is a different price only for that day where the Room B is price is changed to 350. So this is considered as a single day difference, thats why start and end dates are same.
I hope this explained the scenario of the problem. Also note that this hotel may be closed for a specific time period, lets say this is an additional problem to my first question. The problem is what if the rate is not entered on specific dates, for example on Sundays the hotel do not sell these two rooms so they entered no price, meaning the row will not exist in the table.
Creating related tables allows you much greater freedom to query and extract relevant information. Here's a few links that you might find useful:
You could start with these tutorials:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
http://net.tutsplus.com/tutorials/databases/sql-for-beginners/
There are also a couple of questions here on stackoverflow that might be useful:
Normalization in plain English
What exactly does database normalization do?
Anyway, on to a possible solution. The following examples use your hotel rooms analogy.
First, create a table to hold information about the hotel rooms. This table just contains the room ID and its name, but you could store other information in here, such as the room type (single, double, twin), its view (ocean front, ocean view, city view, pool view), and so on:
CREATE TABLE `room` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `name_UNIQUE` (`name` ASC) )
ENGINE = InnoDB;
Now create a table to hold the changing room rates. This table links to the room
table through the room_id
column. The foreign key constraint prevents records being inserted into the rate
table which refer to rooms that do not exist:
CREATE TABLE `rate` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`room_id` INT UNSIGNED NOT NULL,
`date` DATE NOT NULL,
`rate` DECIMAL(6,2) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `fk_room_rate` (`room_id` ASC),
CONSTRAINT `fk_room_rate`
FOREIGN KEY (`room_id` )
REFERENCES `room` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
Create two rooms, and add some daily rate information about each room:
INSERT INTO `room` (`id`, `name`) VALUES (1, 'A'), (2, 'B');
INSERT INTO `rate` (`id`, `room_id`, `date`, `rate`) VALUES
( 1, 1, '2010-07-01', 200),
( 2, 1, '2010-07-02', 200),
( 3, 1, '2010-07-03', 150),
( 4, 1, '2010-07-04', 150),
( 5, 1, '2010-07-05', 150),
( 6, 1, '2010-07-06', 200),
( 7, 1, '2010-07-07', 200),
( 8, 1, '2010-07-08', 100),
( 9, 1, '2010-07-09', 100),
(10, 2, '2010-07-01', 300),
(11, 2, '2010-07-02', 300),
(12, 2, '2010-07-03', 250),
(13, 2, '2010-07-04', 250),
(14, 2, '2010-07-05', 350),
(15, 2, '2010-07-06', 300),
(16, 2, '2010-07-07', 300),
(17, 2, '2010-07-08', 200),
(18, 2, '2010-07-09', 200);
With that information stored, a simple SELECT
query with a JOIN
will show you the all the daily room rates:
SELECT
room.name,
rate.date,
rate.rate
FROM room
JOIN rate
ON rate.room_id = room.id;
+------+------------+--------+
| A | 2010-07-01 | 200.00 |
| A | 2010-07-02 | 200.00 |
| A | 2010-07-03 | 150.00 |
| A | 2010-07-04 | 150.00 |
| A | 2010-07-05 | 150.00 |
| A | 2010-07-06 | 200.00 |
| A | 2010-07-07 | 200.00 |
| A | 2010-07-08 | 100.00 |
| A | 2010-07-09 | 100.00 |
| B | 2010-07-01 | 300.00 |
| B | 2010-07-02 | 300.00 |
| B | 2010-07-03 | 250.00 |
| B | 2010-07-04 | 250.00 |
| B | 2010-07-05 | 350.00 |
| B | 2010-07-06 | 300.00 |
| B | 2010-07-07 | 300.00 |
| B | 2010-07-08 | 200.00 |
| B | 2010-07-09 | 200.00 |
+------+------------+--------+
To find the start and end dates for each room rate, you need a more complex query:
SELECT
id,
room_id,
MIN(date) AS start_date,
MAX(date) AS end_date,
COUNT(*) AS days,
rate
FROM (
SELECT
id,
room_id,
date,
rate,
(
SELECT COUNT(*)
FROM rate AS b
WHERE b.rate <> a.rate
AND b.date <= a.date
AND b.room_id = a.room_id
) AS grouping
FROM rate AS a
ORDER BY a.room_id, a.date
) c
GROUP BY rate, grouping
ORDER BY room_id, MIN(date);
+----+---------+------------+------------+------+--------+
| id | room_id | start_date | end_date | days | rate |
+----+---------+------------+------------+------+--------+
| 1 | 1 | 2010-07-01 | 2010-07-02 | 2 | 200.00 |
| 3 | 1 | 2010-07-03 | 2010-07-05 | 3 | 150.00 |
| 6 | 1 | 2010-07-06 | 2010-07-07 | 2 | 200.00 |
| 8 | 1 | 2010-07-08 | 2010-07-09 | 2 | 100.00 |
| 10 | 2 | 2010-07-01 | 2010-07-02 | 2 | 300.00 |
| 12 | 2 | 2010-07-03 | 2010-07-04 | 2 | 250.00 |
| 14 | 2 | 2010-07-05 | 2010-07-05 | 1 | 350.00 |
| 15 | 2 | 2010-07-06 | 2010-07-07 | 2 | 300.00 |
| 17 | 2 | 2010-07-08 | 2010-07-09 | 2 | 200.00 |
+----+---------+------------+------------+------+--------+
You can find a good explanation of the technique used in the above query here:
http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
- My general approach is to join the table onto itself based on DayDate = DayDate+1 and the A or B values not being equal
- This will find the end dates for each period (where the value is going to be different on the following day)
- The only problem is, that won't find an end date for the final period. To get around this, I selct the max date from the table and union that into my list of end dates
- Once you have the list of end dates defined, you can join them to the original table based on the end date being greater than or equal to the original date
From this final list, select the minimum daydate grouped by the other fields
select min(DayDate) as DayDate,EndDate,A,B from (SELECT DayDate, A, B, min(ends.EndDate) as EndDate FROM yourtable LEFT JOIN (SELECT max(DayDate) as EndDate FROM yourtable UNION SELECT t1.DayDate as EndDate FROM yourtable t1 JOIN yourtable t2 ON date_add(t1.DayDate, INTERVAL 1 DAY) = t2.DayDate AND (t1.A<>t2.A OR t1.B<>t2.B)) ends ON ends.EndDate>=DayDate GROUP BY DayDate, A, B) x GROUP BY EndDate,A,B
I think I have found a solution which does produce the table desired.
SELECT
a.DayDate AS StartDate,
( SELECT b.DayDate
FROM Dates AS b
WHERE b.DayDate > a.DayDate AND (b.B = a.B OR b.B IS NULL)
ORDER BY b.DayDate ASC LIMIT 1
) AS StopDate,
a.A as A,
a.B AS B
FROM Dates AS a
WHERE Coalesce(
(SELECT c.B
FROM Dates AS c
WHERE c.DayDate <= a.DayDate
ORDER BY c.DayDate DESC LIMIT 1,1
), -99999
) <> a.B
AND a.B IS NOT NULL
ORDER BY a.DayDate ASC;
is able to generate the following table result
StartDate StopDate A B
2010-07-01 2010-07-02 200 300
2010-07-03 2010-07-04 150 250
2010-07-05 NULL 150 350
2010-07-06 2010-07-07 200 300
2010-07-08 2010-07-09 100 200
But I need a way to replace the NULL with the same date of the start date.
精彩评论