Calculating from self-join
I have a list of values and dates for stock ticker symbols and want to calculate quarterly returns in SQL.
CREATE TABLE `symbol_details` (
`symbol_header_id` INT(11) DEFAULT NULL,
`DATE` DATETIME DEFAULT NULL,
`NAV` DOUBLE DEFAULT NULL,
`ADJ_NAV` DOUBLE DEFAULT NULL)
For fixed quarter start and end dates that works fine:
set @quarterstart='2008-12-31';
set @quarterend='2009-3-31';
select sha, (100*(aend-abegin)/abegin) as q1_returns from
(select symbol_header_id as sha, ADJ_NAV as abegin from symbol_details
where date=@quarterstart) as a,
(select symbol_header_id as she, ADJ_NAV as aend from symbol_details
where date=@quarterend) as b
where sha=she;
This calculates all quarterly returns for all symbols. Sometimes that end of the quarter is a non-trading day, or the stock ceased operation, so I want to get the closest start and end dates to the quarter start and end dates.
The solution would be to somehow get only the one start and one end of quarter value for each symbol_header_id by some GROUP BY statement, like (1)
SET @quarterstart = '2009-03-01';
SET @quarterend = '2009-4-31';
SELECT symbol_header_id, DATE, ADJ_NAV AS aend FROM symbol_details
WHERE
DATE BETWEEN @quarterstart AND @quarterend
AND symbol_header_id BETWEEN 18540 AND 18550
GROUP BY symbol_header_id asc;
This gives the ADJ_NAV value for the date closest to the start of the quarter for each of the symbol_header_id values.
So finally, (2)
SET @quarterstart = '2008-12-31';
SET @quarterend = '2009-3-31';
SELECT sh1, a.date, b.date, aend, abegin,
(100*(aend-abegin)/abegin) AS quarter_returns FROM
(SELECT symbol_header_id sh1, DATE, ADJ_NAV AS abegin FROM symbol_details
WHERE
DATE BETWEEN @quarterstart AND @quarterend
GROUP BY symbol_header_id DESC) a,
(SELECT symbol_header_id sh2, DATE, ADJ_NAV AS aend FROM symbol_details
WHERE
DATE BETWEEN @quarterstart AND @quarter开发者_开发技巧end
GROUP BY symbol_header_id ASC) b
WHERE sh1 = sh2;
should calculate the quarterly returns for each symbols.
Unfortunately this does not work. For some reason when I restrict the IDs like in (1) the correct start and end dates are used, but when the "AND symbol_header_id BETWEEN 18540 AND 18550" statements are removed, the same start and end dates come up. Why????
The answer with unrolled JOINs is:
SET @quarterstart = '2008-12-31';
SET @quarterend = '2009-3-31';
SELECT tq.sym AS sym,
(100*(alast.adj_nav - afirst.adj_nav)/afirst.adj_nav) AS quarterly_returns
FROM
-- First, determine first traded days ("ftd") and last traded days
-- ("ltd") in this quarter per symbol
(SELECT symbol_header_id AS sym,
MIN(DATE) AS ftd,
MAX(DATE) AS ltd
FROM symbol_details
WHERE DATE BETWEEN @quarterstart AND @quarterend
GROUP BY 1) tq
JOIN symbol_details afirst
-- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
ON afirst.DATE BETWEEN @quarterstart AND @quarterend
AND afirst.symbol_header_id = tq.sym
JOIN
-- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
symbol_details alast
ON alast.DATE BETWEEN @quarterstart AND @quarterend
AND alast.symbol_header_id = tq.sym
WHERE
afirst.date = tq.ftd
AND
alast.date = tq.ltd;
UPDATE:
Incorporating @Hogan's suggestion fully ... and testing it. :) This EXPLAIN
s much simpler, and should be a better performer.
Again, presuming ANSI_QUOTES
behavior:
SELECT tq.sym AS sym,
(100*(alast.adj_nav - afirst.adj_nav)/afirst.adj_nav) AS quarterly_returns
FROM
(SELECT symbol_header_id AS sym, -- find first/last traded day ("ftd", "ltd")
MIN("date") AS ftd,
MAX("date") AS ltd
FROM symbol_details
WHERE "date" BETWEEN @quarterstart AND @quarterend
GROUP BY 1) tq
JOIN symbol_details afirst -- JOIN for ADJ_NAV on first traded day
ON tq.sym = afirst.symbol_header_id
AND
tq.ftd = afirst."date"
JOIN symbol_details alast -- JOIN for ADJ_NAV on last traded day
ON tq.sym = alast.symbol_header_id
AND
tq.ltd = alast."date"
ORIGINAL:
Presuming SET SESSION sql_mode = 'ANSI_QUOTES'
, try this:
SELECT tq.sym AS sym,
(100*(adj_end - adj_begin)/adj_begin) AS quarterly_returns
FROM
-- First, determine first traded days ("ftd") and last traded days
-- ("ltd") in this quarter per symbol
(SELECT symbol_header_id AS sym,
MIN("date") AS ftd,
MAX("date") AS ltd
FROM symbol_details
WHERE "date" BETWEEN @quarterstart AND @quarterend
GROUP BY 1) tq
JOIN
-- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
(SELECT symbol_header_id AS sym,
"date" AS adate,
adj_nav AS adj_begin
FROM symbol_details) afirst
ON afirst.sym = tq.sym
JOIN
-- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
(SELECT symbol_header_id AS sym,
"date" AS adate,
adj_nav AS adj_end
FROM symbol_details) alast
ON alast.sym = tq.sym
WHERE
afirst.adate = tq.ftd
AND
alast.adate = tq.ltd;
Example of unrolling the last sub-query from the below query:
(note -- I did not test)
SELECT tq.sym AS sym,
(100*(alast.adj_nav - adj_begin)/adj_begin) AS quarterly_returns
FROM
-- First, determine first traded days ("ftd") and last traded days
-- ("ltd") in this quarter per symbol
(SELECT symbol_header_id AS sym,
MIN("date") AS ftd,
MAX("date") AS ltd
FROM symbol_details
WHERE "date" BETWEEN @quarterstart AND @quarterend
GROUP BY 1) tq
JOIN
-- Second, determine adjusted NAV for "ftd" per symbol (see WHERE)
(SELECT symbol_header_id AS sym,
"date" AS adate,
adj_nav AS adj_begin
FROM symbol_details) afirst
ON afirst.sym = tq.sym
-- Finally, determine adjusted NAV for "ltd" per symbol (see WHERE)
LEFT JOIN symbol_details alast ON tq.sym = alast.symbol_header_id AND tg.ltd = alast."date"
WHERE
afirst.adate = tq.ftd
Are you sure that you're using GROUP BY ... ASC in your queries? GROUP BY groups those rows which satisfy some criteria and using ASC/DESC makes no sense. By the way, in your case your criterion is equality of symbol_header_id column which, assuming that "id" means that it is a key, effectively makes groups consisting of a single row.
Anyway, if your groups are really groups, you may have problems because you're selecting not only grouping attribute and aggregate functions, but also some other attributes. Their value is unpredictable in this case.
精彩评论