Group by run when there is no run number in data (was Show how changing the length of a production run affects time-to-build)
It would seem that there is a much simpler way to state the problem. Please see Edit 2, following the sample table.
I have a number of different products on a production line. I have the date that each product entered production. Each product has two identifiers: item number and serial number I have the total number of labour hours for each product by item number and by serial number (i.e. I can tell you how many hours went into each object that was manufactured and what the average build time is for each kind of object).
I want to determine how (if) varying the length of production runs affects the average time it takes to build a product (item number). A production run is the sequential production of multiple serial numbers for a single item number. We have historical records going back several years with production runs varying in length from 1 to 30.
I think to achieve this, I need to be able to assign 'run id'. To me, that means building a query that sorts by start date and calculates a new unique value at each change in item number. If I knew how to do that, I could solve the rest of the problem on my own.
So that suggests a series of related questions:
- Am I thinking about this the right way?
- If I am on the right track, how do I generate those run id values? Calculate and store is an option, although I have a (misguided?) preference for direct queries. I know exactly how I would generate the run numbers in Excel, but I have a (misguided?) preference to do this i开发者_如何转开发n the database.
- If I'm not on the right track, where might I find that track? :)
Edit: Table structure (simplified) with sample data:
AutoID Item Serial StartDate Hours RunID (proposed calculation) 1 Legend 1234 2010-06-06 10 1 3 Legend 1235 2010-06-07 9 1 2 Legend 1237 2010-06-08 8 1 4 Apex 1236 2010-06-09 12 2 5 Apex 1240 2010-06-10 11 2 6 Legend 1239 2010-06-11 10 3 7 Legend 1238 2010-06-12 8 3
I have shown that start date, serial, and autoID are mutually unrelated. I have shown the expectation that labour goes down as the run length increases (but this is a 'fact' only via received wisdom, not data analysis). I have shown what I envision as the heart of the solution, that being a RunID that reflects sequential builds of a single item. I know that if I could get that runID, I could group by run to get counts, averages, totals, max, min, etc. In addition, I could do something like hours/ to get percentage change from the start of the run. At that point I could graph the trends associated with different run lengths either globally across all items or on a per item basis. (At least I think I could do all that. I might have to muck about a bit, but I think I could get it done.)
Edit 2: This problem would appear to be: how do I get the 'starting' member (earliest start date) of each run when I don't already have a runID? (The runID shown in the sample table does not exist and I was originally suggesting that being able to calculate runID was a potentially viable solution.)
AutoID Item 1 Legend 4 Apex 6 Legend
I'm assuming that having learned how to find the first member of each run that I would then be able to use what I've learned to find the last member of each run and then use those two results to get all other members of each run.
Edit 3: my version of a query that uses the AutoID of the first item in a run as the RunID for all units in a run. This was built entirely from samples and direction provided by Simon, who has the accepted answer. Using this as the basis for grouping by run, I can produce a variety of run statistics.
SELECT first_product_of_run.AutoID AS runID, run_sibling.AutoID AS itemID, run_sibling.Item, run_sibling.Serial, run_sibling.StartDate, run_sibling.Hours FROM (SELECT first_of_run.AutoID, first_of_run.Item, first_of_run.Serial, first_of_run.StartDate, first_of_run.Hours FROM dbo.production AS first_of_run LEFT OUTER JOIN dbo.production AS earlier_in_run ON first_of_run.AutoID - 1 = earlier_in_run.AutoID AND first_of_run.Item = earlier_in_run.Item WHERE (earlier_in_run.AutoID IS NULL)) AS first_product_of_run LEFT OUTER JOIN dbo.production AS run_sibling ON first_product_of_run.Item = run_sibling.Item AND first_product_of_run.AutoID run_sibling.AutoID AND first_product_of_run.StartDate product_between.Item AND first_product_of_run.StartDateCould you describe your table structure some more? If the "date that each product entered production" is a full time stamp, or if there is a sequential identifier across products, you can write queries to identify the first and last products of a run. From that, you can assign IDs to or calculate the length of the runs.
Edit: Once you've identified 1,4, and 6 as the start of a run, you can use this query to find the other IDs in the run:
select first_product_of_run.AutoID, run_sibling.AutoID
from first_product_of_run
left join production run_sibling on first_product_of_run.Item = run_sibling.Item
and first_product_of_run.AutoID <> run_sibling.AutoID
and first_product_of_run.StartDate < run_sibling.StartDate
left join production product_between on first_product_of_run.Item <> product_between.Item
and first_product_of_run.StartDate < product_between.StartDate
and product_between.StartDate < run_sibling.StartDate
where product_between.AutoID is null
first_product_of_run
can be a temp table, table variable, or sub-query that you used to find the start of a run. The key is the where product_between.AutoID is null
. That restricts the results to only pairs where no different items were produced between them.
Edit 2, here's how to get the first of each run:
select first_of_run.AutoID
from
(
select product.AutoID, product.Item, MAX(previous_product.StartDate) as PreviousDate
from production product
left join production previous_product on product.AutoID <> previous_product.AutoID
and product.StartDate > previous_product.StartDate
group by product.AutoID, product.Item
) first_of_run
left join production earlier_in_run
on first_of_run.PreviousDate = earlier_in_run.StartDate
and first_of_run.Item = earlier_in_run.Item
where earlier_in_run.AutoID is null
It's not pretty, and will break if StartDate
is not unique. The query could be simplified by adding a sequential and unique identifier with no gaps. In fact, that step will probably be necessary if StartDate
is not unique. Here's how it would look:
select first_of_run.AutoID
from production first_of_run
left join production earlier_in_run
on (first_of_run.Sequence - 1) = earlier_in_run.Sequence
and first_of_run.Item = earlier_in_run.Item
where earlier_in_run.AutoID is null
Using outer joins to find where things aren't still twists my brain, but it's a very powerful technique.
精彩评论