How do I add values of certain column variables if they apprear in repeated rows with same key values
I am reading a csv file with customer buys for each week over a movie's viewing window on the video on demand. The file consiste of several columns for a given movie, including title, price, week numer, and number of buys in that occur in that week. Sometimes, a specific movie may have the buys for a given week in mul开发者_如何学运维tiple rows (for the same week).
I am trying to manipulate the data so that if there are multiole rows for the same week for a given movie, i want to replace it with one row for the week and the buys added up.
I have just started to look at R last few weeks and have failed miserabley in tis attempt. Please help!!!
I think I followed what you wanted. Let me know if this is what you had in mind.
First, make up some data:
movies <- data.frame(mov = c("Foo", rep(c("Foo", "Bar", "Baz"),3))
, price = c(7, rep(c(7, 5, 2),3))
, wknumb = c(1, rep(1:3, each = 3))
, buys = sample(10:20, 10, TRUE))
mov price wknumb buys
1 Foo 7 1 12
2 Foo 7 1 11
3 Bar 5 1 10
4 Baz 2 1 19
5 Foo 7 2 10
6 Bar 5 2 15
7 Baz 2 2 15
8 Foo 7 3 14
9 Bar 5 3 15
10 Baz 2 3 20
We have two entries for movie "Foo" and week 1 which we need to collapse into a single entry. We can use the plyr
package for this:
library(plyr)
ddply(movies, c("mov", "wknumb"), summarize, totbuys = sum(buys))
Which returns a data.frame with 9 rows, one for each movie and week:
mov wknumb totbuys
1 Bar 1 10
2 Bar 2 15
3 Bar 3 15
4 Baz 1 19
5 Baz 2 15
6 Baz 3 20
7 Foo 1 23
8 Foo 2 10
9 Foo 3 14
A variation using aggregate
from base R (and Chase's dataset).
with(movies, aggregate(buys, list(mov = mov, wknumb = wknumb), sum))
Many of these can be handled by sqldf if you are familiar with sql.
library(sqldf)
sqldf("select mov, wknumb, sum(buys) as sumBuys from movies group by wknumb, mov")
精彩评论