SQL: Get all missing date records from database
I have a DB table with the following structure
id | dateCreated | numOfUsers
Where a typical row is 1, '2011-10-13 12:00:00', 4
My row contains data for the last 4 months, however quite a few days are missing and i'd like to find out using SQL the missing days, any ideas how I go about writing this query?
I'd suspect you go about this by somehow getting a list of days and comparing these against the values in the database.
I know you can do this开发者_开发问答 using PHP or other programming language using multiple queries however I'd like to do this on the database level if possible.
Thanks in advance
For PostgreSQL generating the "list of dates" on the fly is quite easy using the generate_series() function:
with all_dates as (
select cast((current_date - interval '4' month) as date) + i as the_date
from generate_series(0, extract(day from current_date - (current_date - interval '4' month))::int) as i
)
select ad.the_date,
y.id,
y.numOfUsers
from all_dates t
left join your_table y ON y.dateCreated = t.the_date;
AFAIK there is not one statements to achieve this working for all databases... for Oracle you can do this (MyTable
is the DB table you want to check for missing Dates):
SELECT * FROM
(
SELECT A.MinD + MyList.L TheDate FROM
(SELECT MIN (dateCreated ) MinD FROM MyTable) A,
(SELECT LEVEL - 1 L FROM DUAL CONNECT BY LEVEL <= (SELECT Max (dateCreated ) - MIN (dateCreated ) + 1 FROM MyTable)) MyList
) D WHERE D.TheDate NOT IN ( SELECT dateCreated FROM MyTable T)
Assuming MySQL, you can use a variable to carry state through each row in the query results:
SELECT @last := 'date you want to start with';
SELECT id, dateCreated, DATE_DIFF(dateCreated, @last) AS diff, @last := dateCreated
FROM yourtable
ORDER BY dateCreated ASC
HAVING diff > 1
Note that this won't return the actual missing days, but it WILL return the rows AFTER the missing dates, as well as the count of missing days.
There is a way to do it without a date table, start and end dates or any other form of iteration.
select DATEADD(day,1,left.dateCreated) as MissingDates
from dbo.MyTable as left
left outer join dbo.MyTable as right on DATEADD(day,1,left.dateCreated) = right.entry_time
where right.dateCreated is null
This will return a column of start dates for spans of missing dates. You can then also create another column that returns the last date in the span of missing dates by subtracting 1 instead of adding one day to the second comparison table.
Easily the most effective way (in my opinion) is, as you state, to start with a table of all dates. You would have to create that yourself, and assuming you have done, here are a couple of options for you...
SELECT
*
FROM
calendar -- Your manually created table of dates
LEFT JOIN
yourTable
ON yourTable.DateField = calendar.DateField
WHERE
yourTable.DateField IS NULL
AND calendar.DateField >= @reportFirstDate
AND calendar.DateField <= @reportLastdate
Or...
SELECT
*
FROM
calendar -- Your manually created table of dates
WHERE
NOT EXISTS (SELECT * FROM yourTable WHERE yourTable.DateField = calendar.DateField)
AND calendar.DateField >= @reportFirstDate
AND calendar.DateField <= @reportLastdate
EDIT
Although it can feel 'untidy' maintaining this list of dates, it has Massive performance benefits for this type of query.
With the dates table you're looking at two indexes and check for what exists in one but not another.
Without the dates table you have a much more complex approach...
1. Take every record in the table
2. Self Join it to the Next record in the table
3. If they are consecutive dates, discard them (Keeping records with a gap between them)
4. For every pair, loop through, filling in the missing dates
5. Deal with dates missing at the start of your reporting period (No Date1 in the pairs)
6. Deal with dates missing at the end of your reporting period (No Date2 in the pairs)
It's probably actually faster to create a temp table with the dates, at do it the straight forward way, then delete the table again. And if that's the case, why wouldn't you just maintain the dates table?
- Just have a 100 year table, and forget about it
- Have a very quick bit of code to keep the dates up to date in all your code
If you're not convinced, I'd recommend trialling the different options and seeing for yourself just how much faster the indexed dates table is compared ot the other options.
(Not to mention how much shorter, readable and maintainable)
MySQL:
assuming we have table yourTable
and date field d
:
set @day = 0;
select v.y as `month`, v.m as `month`, v.d as `day`
from
(
select y_m.y, Y_m.m, dd.d
from
(
select distinct year(d) y, month(d) m, DAY(LAST_DAY(yourTable.d)) max_days from yourTable
) y_m,
(
select @day := @day+1 as `d`
from
information_schema.tables
limit 32
) dd
where y_m.max_days >= dd.d
) v
left join
yourTable on (year(yourTable.d) = v.y and month(yourTable.d) = v.m and day(yourTable.d) = v.d)
where yourTable.d is null
;
精彩评论