Group by run when there is no run number in data (was Show how changing the length of a production run affects time-to-build)

2023-01-28 09:19 问答作者：

It would seem that there is a much simpler way to state the problem. Please see Edit 2, following the sample table.

I have a number of different products on a production line. I have the date that each product entered production. Each product has two identifiers: item number and serial number I have the total number of labour hours for each product by item number and by serial number (i.e. I can tell you how many hours went into each object that was manufactured and what the average build time is for each kind of object).

I want to determine how (if) varying the length of production runs affects the average time it takes to build a product (item number). A production run is the sequential production of multiple serial numbers for a single item number. We have historical records going back several years with production runs varying in length from 1 to 30.

I think to achieve this, I need to be able to assign 'run id'. To me, that means building a query that sorts by start date and calculates a new unique value at each change in item number. If I knew how to do that, I could solve the rest of the problem on my own.

So that suggests a series of related questions:

Am I thinking about this the right way?
If I am on the right track, how do I generate those run id values? Calculate and store is an option, although I have a (misguided?) preference for direct queries. I know exactly how I would generate the run numbers in Excel, but I have a (misguided?) preference to do this i开发者_如何转开发n the database.
If I'm not on the right track, where might I find that track? :)

Edit: Table structure (simplified) with sample data:

AutoID    Item       Serial     StartDate    Hours     RunID (proposed calculation)
 1         Legend     1234       2010-06-06     10        1
 3         Legend     1235       2010-06-07      9        1
 2         Legend     1237       2010-06-08      8        1
 4         Apex       1236       2010-06-09     12        2
 5         Apex       1240       2010-06-10     11        2
 6         Legend     1239       2010-06-11     10        3
 7         Legend     1238       2010-06-12      8        3

I have shown that start date, serial, and autoID are mutually unrelated. I have shown the expectation that labour goes down as the run length increases (but this is a 'fact' only via received wisdom, not data analysis). I have shown what I envision as the heart of the solution, that being a RunID that reflects sequential builds of a single item. I know that if I could get that runID, I could group by run to get counts, averages, totals, max, min, etc. In addition, I could do something like hours/ to get percentage change from the start of the run. At that point I could graph the trends associated with different run lengths either globally across all items or on a per item basis. (At least I think I could do all that. I might have to muck about a bit, but I think I could get it done.)

Edit 2: This problem would appear to be: how do I get the 'starting' member (earliest start date) of each run when I don't already have a runID? (The runID shown in the sample table does not exist and I was originally suggesting that being able to calculate runID was a potentially viable solution.)

AutoID    Item 
 1         Legend
 4         Apex   
 6         Legend

I'm assuming that having learned how to find the first member of each run that I would then be able to use what I've learned to find the last member of each run and then use those two results to get all other members of each run.

Edit 3: my version of a query that uses the AutoID of the first item in a run as the RunID for all units in a run. This was built entirely from samples and direction provided by Simon, who has the accepted answer. Using this as the basis for grouping by run, I can produce a variety of run statistics.

SELECT first_product_of_run.AutoID AS runID, run_sibling.AutoID AS itemID, run_sibling.Item, run_sibling.Serial, run_sibling.StartDate, run_sibling.Hours FROM (SELECT first_of_run.AutoID, first_of_run.Item, first_of_run.Serial, first_of_run.StartDate, first_of_run.Hours FROM dbo.production AS first_of_run LEFT OUTER JOIN dbo.production AS earlier_in_run ON first_of_run.AutoID - 1 = earlier_in_run.AutoID AND first_of_run.Item = earlier_in_run.Item WHERE (earlier_in_run.AutoID IS NULL)) AS first_product_of_run LEFT OUTER JOIN dbo.production AS run_sibling ON first_product_of_run.Item = run_sibling.Item AND first_product_of_run.AutoID run_sibling.AutoID AND first_product_of_run.StartDate product_between.Item AND first_product_of_run.StartDate

Could you describe your table structure some more? If the "date that each product entered production" is a full time stamp, or if there is a sequential identifier across products, you can write queries to identify the first and last products of a run. From that, you can assign IDs to or calculate the length of the runs.

Edit: Once you've identified 1,4, and 6 as the start of a run, you can use this query to find the other IDs in the run:

select first_product_of_run.AutoID, run_sibling.AutoID
from first_product_of_run
left join production run_sibling on first_product_of_run.Item = run_sibling.Item
    and first_product_of_run.AutoID <> run_sibling.AutoID
    and first_product_of_run.StartDate < run_sibling.StartDate
left join production product_between on first_product_of_run.Item <> product_between.Item
    and first_product_of_run.StartDate < product_between.StartDate
    and product_between.StartDate < run_sibling.StartDate
where product_between.AutoID is null

first_product_of_run can be a temp table, table variable, or sub-query that you used to find the start of a run. The key is the where product_between.AutoID is null. That restricts the results to only pairs where no different items were produced between them.

Edit 2, here's how to get the first of each run:

select first_of_run.AutoID
from 
(
select product.AutoID, product.Item, MAX(previous_product.StartDate) as PreviousDate
from production product
left join production previous_product on product.AutoID <> previous_product.AutoID
    and product.StartDate > previous_product.StartDate
group by product.AutoID, product.Item
) first_of_run
left join production earlier_in_run
    on first_of_run.PreviousDate = earlier_in_run.StartDate
    and first_of_run.Item = earlier_in_run.Item
where earlier_in_run.AutoID is null

It's not pretty, and will break if StartDate is not unique. The query could be simplified by adding a sequential and unique identifier with no gaps. In fact, that step will probably be necessary if StartDate is not unique. Here's how it would look:

select first_of_run.AutoID
from production first_of_run
left join production earlier_in_run
    on (first_of_run.Sequence - 1) = earlier_in_run.Sequence
    and first_of_run.Item = earlier_in_run.Item
where earlier_in_run.AutoID is null

Using outer joins to find where things aren't still twists my brain, but it's a very powerful technique.

继续阅读：sql sql-server-2005

Group by run when there is no run number in data (was Show how changing the length of a production run affects time-to-build)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？