
Obtain running frequency distribution from previous N rows of MySQL database

I have a MySQL database where one column contains status codes. The column is of type int and the values will only ever be 100,200,300,400. It looks like below; other columns removed for clarity.

id   |  status
 1      300
 2      100
 3      100
 4      200
 5      300
 6      300
 7      100
 8      400
 9      200
10      300
11      100
12      400
13      400
14      400
15      300
16      300

The id field is auto-generated and will always be sequential. I want to have a third column displaying a comma-separated string of the frequency distribution of the status codes of the previous 10 rows. It should look like this.

id   |  status  |  freq
 1      300
 2      100
 3      100
 4      200
 5      200
 6      300
 7      100
 8      400
 9      300
10      300
11      100       300,100,200,400    -- from rows 1-10
12      400       100,300,200,400    -- from rows 2-11
13      400       100,300,200,400    -- from rows 3-12
14      400       300,400,100,200    -- from rows 4-13
15      300       400,300,100,200    -- from rows 5-14
16      300       300,400,100        -- from rows 6-15

I want the most frequent code listed first. And where two status codes have the same frequency it doesn't matter to me which is listed first but I did list the smaller code before the larger in the example. Lastly, where a code doesn't appear at all in the previous ten rows, it shouldn't be listed in the freq column either.

And to be very clear the row number that the frequency string appears on does NOT take into account the status code of that row; it's only the previous rows.

So what have I done? I'm pretty green with SQL. I'm a programmer and I find 开发者_高级运维this SQL language a tad odd to get used to. I managed the following self-join select statement.

select *, avg(b.status) freq
from sample a
join sample b
on (b.id < a.id) and (b.id > a.id - 11)
where a.id > 10
group by a.id;

Using the aggregate function avg, I can at least demonstrate the concept. The derived table b provides the correct rows to the avg function but I just can't figure out the multi-step process of counting and grouping rows from b to get a frequency distribution and then collapse the frequency rows into a single string value.

Also I've tried using standard stored functions and procedures in place of the built-in aggregate functions, but it seems the b derived table is out of scope or something. I can't seem to access it. And from what I understand writing a custom aggregate function is not possible for me as it seems to require developing in C, something I'm not trained for.

Here's sql to load up the sample.

create table sample (
    PRIMARY KEY(id),
    status int

insert into sample(status) values(300),(100),(100),(200),(200),(300)

The sample has 30 rows of data to work with. I know it's a long question, but I just wanted to be as detailed as I could be. I've worked on this for a few days now and would really like to get it done.

Thanks for your help.

The only way I know of to do what you're asking is to use a BEFORE INSERT trigger. It has to be BEFORE INSERT because you want to update a value in the row being inserted, which can only be done in a BEFORE trigger. Unfortunately, that also means it won't have been assigned an ID yet, so hopefully it's safe to assume that at the time a new record is inserted, the last 10 records in the table are the ones you're interested in. Your trigger will need to get the values of the last 10 ID's and use the GROUP_CONCAT function to join them into a single string, ordered by the COUNT. I've been using SQL Server mostly and I don't have access to a MySQL server at the moment to test this, but hopefully my syntax will be close enough to at least get you moving in the right direction:

create trigger sample_trigger BEFORE INSERT ON sample 
    DECLARE _freq varchar(50);

    SELECT GROUP_CONCAT(tbl.status ORDER BY tbl.Occurrences) INTO _freq
    FROM (SELECT status, COUNT(*) AS Occurrences, 1 AS grp FROM sample ORDER BY id DESC LIMIT 10) AS tbl
    GROUP BY tbl.grp

    SET new.freq = _freq;

    (SELECT a.id as id, b.status, COUNT(*) as freq
        sample a
        sample b ON (b.id < a.id) AND (b.id > a.id - 11)
        a.id > 10
    GROUP BY a.id, b.status) AS sub

SQL Fiddle





验证码 换一张
取 消

