How to write these two queries for a simple data warehouse, using ANSI SQL?
I am writing a simple data warehouse that will allow me to query the table to observe periodic (say weekly) changes in data, as well as changes in the change of the data (e.g. week to week change in the weekly sale amount).
For the purposes of simplicity, I will present very simplified (almost trivialized) versions of the tables I am using here. The sales data table is a view and has the following structure:
CREATE TABLE sales_data (
sales_time date NOT NULL,
sales_amt double NOT NULL
)
For the purpose of this question. I have left out other fields you would expect to see - like product_id, sales_person_id etc, etc, as they have no direct relevance to this question. AFAICT, the only fields that will be used in the query are the sales_time and the sales_amt fields (unless I am mistaken).
I also have a date dimension table with the following structure:
CREATE TABLE date_dimension (
id integer NOT NULL,
datestamp date NOT NULL,
day_part integer NOT NULL,
week_part integer NOT NULL,
month_part integer NOT NULL,
qtr_part integer NOT NULL,
year_part integer NOT NULL,
);
which partition dates into reporting ranges.
I need to write queries that will allow me to do the following:
Return the change in week on week sales_amt for a specified period. For example the change between sales today and sales N days ago - where N is a positive integer (N == 7 i开发者_C百科n this case).
Return the change in change of sales_amt for a specified period. For in (1). we calculated the week on week change. Now we want to know how that change is differs from the (week on week) change calculated last week.
I am stuck however at this point, as SQL is my weakest skill. I would be grateful if an SQL master can explain how I can write these queries in a DB agnostic way (i.e. using ANSI SQL).
As noted in the comment above, I probably do not understand your model -- so here is a simple one to get started.
Now if I want weekly sales for calendar year of 2010
select
CalendarYearWeek
, sum(SalesAmount)
from factSales as f
join dimDate as d on d.DateKey = f.DateKey
where Year = 2010
group by CalendarYearWeek
CalendarYearWeek
is a column in dimDate, varchar(8), for example '2010-w03', Year
is an integer column in dimDate too.
Not sure if this is close to what you were looking for, but may be a start.
EDIT
dimDate also has these columns:
WeekNumberInEpoch, integer -- increases increases starting from some epoch date in past. All rows in dimDate within the same week have the same WeekNumberInEpoch.
DayOfWeek, varchar(10) -- 'sunday', 'monday', ...
DayNumberInWeek, integer -- 1-7
This uses CTEs, should work with latest PostgreSQL, SQL Server, Oracle, DB2. For others you may package the CTE (q_00) into a sub-query.
-- for week to previous week
with
q_00 as (
select
WeekNumberInEpoch
, sum(SalesAmount) as Amount
from factSale as f
join dimDate as d on d.DateKey = f.DateKey
where CalendarYear = 2010
group by WeekNumberInEpoch
)
select
a.WeekNumberInEpoch
, a.Amount as ThisWeekSales
, b.Amount as LastWeekSales
, a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
order by a.WeekNumberInEpoch desc ;
-- for day of week to day of previous week
-- monday to monday, tuesday to tuesday, ...
with
q_00 as (
select
WeekNumberInEpoch
, DayOfWeek
, sum(SalesAmount) as Amount
from factSale as f
join dimDate as d on d.DateKey = f.DateKey
where CalendarYear = 2010
group by WeekNumberInEpoch, DayOfWeek
)
select
a.WeekNumberInEpoch
, a.DayOfWeek
, a.Amount as ThisWeekSales
, b.Amount as LastWeekSales
, a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on (b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
and b.DayOfWeek = a.DayOfWeek)
order by a.WeekNumberInEpoch desc, a.DayOfWeek ;
-- Sliding by day and day difference (= 7)
with
q_00 as (
select
DayNumberInEpoch
, FullDate
, DayOfWeek
, sum(SalesAmount) as Amount
from factSale as f
join dimDate as d on d.DateKey = f.DateKey
where CalendarYear = 2010
group by DayNumberInEpoch, FullDate, DayOfWeek
)
select
a.FullDate as ThisDay
, a.DayOfWeek as ThisDayName
, a.Amount as ThisDaySales
, b.FullDate as PreviousPeriodDay
, b.DayOfWeek as PreviousDayName
, b.Amount as PreviousPeriodDaySales
, a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.DayNumberInEpoch = a.DayNumberInEpoch - 7
order by a.FullDate desc ;
I suggest you build a separate dimension table for 'time' (one day per row, that contains information about repeating time periods (day, week, month, quarter) so you can easily join/select for that type of information.
Your queries for (1.) and (2.) could be built that way.
Yes, most SQL dialects allow infering that information with time/date function .. but they are slow (-er) and more complicated than using a dimension table ....
精彩评论