开发者

Basic Velocity Algorithm?

Given the following dataset for a single article on my site:

Article 1
2/1/2010 100
2/2/2010 80
2/3/2010 60

Article 2
2/1/2010 20000
2/2/2010 250开发者_运维问答00
2/3/2010 23000

where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic velocity calculation that can be done to determine if this article is trending upwards or downwards for the most recent 3 days?

Caveats, the articles will not know the total number of pageviews only their own totals. Ideally with a number between 0 and 1. Any pointers to what this class of algorithms is called?

thanks!


update: Your data actually already is a list of velocities (pageviews/day). The following answer simply shows how to find the average velocity over the past three days. See my other answer for how to calculate pageview acceleration, which is the real statistic you are probably looking for.

Velocity is simply the change in a value (delta pageviews) over time:

For article 1 on 2/3/2010:

delta pageviews = 100 + 80 + 60 
                = 240 pageviews
delta time = 3 days

pageview velocity (over last three days) = [delta pageviews] / [delta time]
                                         = 240               / 3
                                         = 80 pageviews/day

For article 2 on 2/3/2010:

delta pageviews = 20000 + 25000 + 23000 
                = 68000 pageviews
delta time = 3 days

pageview velocity (over last three days) = [delta pageviews] / [delta time] 
                                         = 68,000             / 3
                                         = 22,666 + 2/3 pageviews/day

Now that we know the maximum velocity, we can scale all the velocities to get relative velocities between 0 and 1 (or between 0% and 100%):

relative pageview velocity of article 1 = velocity / MAX_VELOCITY
                                        = 240      / (22,666 + 2/3)
                                        ~ 0.0105882353
                                        ~ 1.05882353%

relative pageview velocity of article 2 = velocity      / MAX_VELOCITY
                                        = (22,666 + 2/3)/(22,666 + 2/3)
                                        = 1
                                        = 100%


"Pageview trend" likely refers to pageview acceleration, not velocity. Your dataset actually already is a list of velocities (pageviews/day). Pageviews are non-decreasing values, so pageview velocity can never be negative. The following describes how to calculate pageview acceleration, which may be negative.

PV_acceleration(t1,t2) = (PV_velocity{t2} - PV_velocity{t1}) / (t2 - t1)
("PV" == "Pageview")

Explanation: Acceleration is simply change in velocity divided by change in time. Since your dataset is a list of page view velocities, you can plug them directly into the formula:

PV_acceleration("2/1/2010", "2/3/2010") = (60 - 100) / ("2/3/2010" - "2/1/2010")
                                        = -40        / 2
                                        = -20 pageviews per day per day

Note the data for "2/2/2010" was not used. An alternate method is to calculate three PV_accelerations (using a date range that goes back only a single day) and averaging them. There is not enough data in your example to do this for three days, but here is how to do it for the last two days:

PV_acceleration("2/3/2010", "2/2/2010") = (60 - 80) / ("2/3/2010" - "2/2/2010")
                                        = -20        / 1
                                        = -20 pageviews per day per day

PV_acceleration("2/2/2010", "2/1/2010") = (80 - 100) / ("2/2/2010" - "2/1/2010")
                                        = -20        / 1
                                        = -20 pageviews per day per day

PV_acceleration_average("2/3/2010", "2/2/2010") = -20 + -20 / 2
                                                = -20 pageviews per day per day

This alternate method did not make a difference for the article 1 data because the page view acceleration did not change between the two days, but it will make a difference for article 2.


Just a link to an article about the 'trending' algorithm reddit, SUs and HN use among others.

  • http://www.seomoz.org/blog/reddit-stumbleupon-delicious-and-hacker-news-algorithms-exposed
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜