How to calculate moving average of distinct value count in SQL?

2022-12-07 18:23 问答作者：

I have the following table:

date       user_city 
2000-01-01 amsterdam
2000-01-01 copenhagen
2000-01-01 amsterdam
2000-01-01 vienna
2000-01-01 prague
2000-01-02 vienna
2000-01-02 amsterdam
2000-01-02 tokio
2000-01-03 copenhagen
2000-01-03 london
2000-01-03 prague
2000-01-03 amsterdam
...

Now I would like to have a 7-day rolling average of the number of the user_city values without repetition (so in case of 2000-01-01 the daily number is only 4, as amsterdam is represented two times). What is the relevant SQL (MySQL) query to get the result?

Expected outcome (in case of a 3-day rolling average fitted to the example above):

window_start_date avg
2001-01-01        3.6666
2001-01-04        ?
2001-01-07        ?

I was able to produce the code in Python, it may helps understanding my aim:

df = original_table.groupby("date")["user_city"].nunique()
df = df.reset_index()
df["开发者_开发知识库rolling_avg"] = df["user_city"].rolling(7).mean()

After you edited your question, this will work for your three day rolling average.

First you can create a helper table, which will hold the start and end date of the windows you want to calculate the average over. You can do this by using a recursive cte. It is called date_range in the query below and will look like this:

window_start_date	window_end_date
2000-01-01	2000-01-03
2000-01-04	2000-01-06
2000-01-07	2000-01-09
...	...

To get the correct amount of visits per day, you can use another cte counting the DISTINCT appearances of user_city and grouping by date. This is called daily_counts in the query below.

You can now LEFT JOIN the visits to the date range, using a conditional statement meaning the date of the visit is BETWEEN the window_start_date and window_end_date.

Lastly, GROUP BY the window_start_date and calculate the average number of visited cities.

WITH RECURSIVE date_range AS (
    SELECT '2000-01-01' as window_start_date, 
          '2000-01-01' + INTERVAL 2 day as window_end_date
    UNION ALL
    SELECT window_start_date + INTERVAL 3 day, 
           window_end_date + INTERVAL 3 day
    from date_range
    where window_start_date <= '2000-12-31'
), 

  daily_counts as (
    SELECT date, COUNT(DISTINCT user_city) AS daily_count
    FROM visits
    GROUP BY date
)

SELECT window_start_date, AVG(daily_count) AS avg
FROM date_range
LEFT JOIN daily_counts
  ON daily_counts.date BETWEEN window_start_date AND window_end_date
GROUP BY window_start_date

Added a dbfiddle which includes some more dates in the future. You can change the interval values in the date_range cte to get a 7 day rolling average.

Here is a possible SQL query that you can use to calculate the 7-day rolling average of the number of unique user_city values:

SELECT date,
       AVG(num_cities) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_num_cities
FROM (
    SELECT date, COUNT(DISTINCT user_city) AS num_cities
    FROM mytable
    GROUP BY date
) t

This query first uses a subquery to calculate the number of unique user_city values for each date. Then, the outer query uses the AVG function with a window function to calculate the 7-day rolling average of this number. The window function uses the ROWS BETWEEN clause to specify a window of 7 days, starting from 6 days before the current row up to the current row.

The result of this query will be a table containing the date and the 7-day rolling average of the number of unique user_city values for each date. For example, for the data you provided, the result might look like this:

date        avg_num_cities
2000-01-01  4
2000-01-02  4
2000-01-03  3.5

Note that the first six rows in the result will not have a complete 7-day window, so the average will be calculated based on a smaller number of days.

I hope this helps!

继续阅读：moving-average sql

How to calculate moving average of distinct value count in SQL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？