Avoiding Loop with R using Apply (?)

2023-01-07 17:07 问答作者：

I'm trying to run apply a function to each row of a dataset. The function looks up matching rows in a second dataset and computes a similarity score for the product details passed to it.

The function works if I just call it with test numbers but I can't figure out how to run it on all rows of my dataset. I've tried using apply but can't g开发者_Go百科et it working.

I'm going to be iterating different parameter settings to find those that best fit historical data so speed is important... meaning that a loop is out. Any help you can provide would be hugely appreciated.

Thanks! Alan

GetDistanceTest <- function(SnapshotDate, Cand_Type, Cand_Height, Cand_Age) {
    HeightParam <- 1/5000
        AgeParam <- 1
    Stock_SameType <- HistoricalStock[!is.na(HistoricalStock$date) & !is.na(HistoricalStock$Type) & as.character(HistoricalStock$date)==as.character(SnapshotDate) & HistoricalStock$Type==Cand_Type,]

    Stock_SameType$ED <- (HeightParam*(Stock_SameType$Height - Cand_Height))^2 + (AgeParam*(Stock_SameType$Age - Cand_Age))^2

    return(sqrt(sum(Stock_SameType$ED)))

}

HistoricalStock <- HistoricalAQStock[,c(1, 3, 4, 5)]
colnames(HistoricalStock) <- c("date", "Age", "Height", "Type")
Sales <- AllSales[,c(2,10,11,25)]
colnames(Sales) <- c("date", "Age", "Height", "Type")

GetDistanceTest("2010-04-01", 5261, 12, 7523) #works and returns a single number

res1 <- transform(Sales, ClusterScore=GetDistanceTest(date, Type, Height, Age))
        # returns Error in `$<-.data.frame`(`*tmp*`, "ED", value = c(419776714.528591, 22321257.0276852,  :  replacement has 4060 rows, data has 54
    # also 4 warnings, one for each variable. e.g. 1: In as.character(HistoricalStock$date) == as.character(SnapshotDate) :  longer object length is not a multiple of shorter object length

res2 <- apply(Sales, 1, GetDistanceTest, Sales$Type, Sales$Height, Sales$Age)
    # `$<-.data.frame`(`*tmp*`, "ED", value = c(419648071.041523, 22325941.2704261,  : replacement has 4060 rows, data has 13
    # also same 4 warnings as res1

I took some liberties with your code b/c I try to vectorize vice use loops whenever I can... With the merge function, you merge the two data frames, and operate on the "columns", which allows you to use the vectorization built into R. I think this will do what you want (in the second line I'm just making sure that A and B don't have the same values for height and age so that your distance isn't always zero):

A <- B <- data.frame(date=Sys.Date()-9:0, stock=letters[1:10], type=1:10, height=1:10, age=1:10)
B$height <- B$age <- 10:1
AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1/5000
age.param <- 1
temp <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )

Use mapply, the multivariate form of apply:

res1 <- mapply(GetDistanceTest, Sales$date, Sales$Type, Sales$Height, Sales$Age)

Code as per above comment:

A <- data.frame(date=rep(Sys.Date()-9:0,100), id=letters[1:10], type=floor(runif(1000, 1, 10)), height=runif(1000, 1, 100), age=runif(1000, 1, 100))
B <- data.frame(date=rep(Sys.Date()-9:0,1000), type=floor(runif(10000, 1, 10)), height=runif(10000, 1, 10), age=runif(10000, 1, 10))



AB <- merge(x=A, y=B, by=c("date", "type"), suffixes=c(".A", ".B"))
height.param <- 1
age.param <- 1
AB$ClusterScore <- sqrt( height.param * (AB$height.A - AB$height.B)^2 + age.param * (AB$age.A - AB$age.B)^2 )
Scores <- ddply(AB, c("id"), function(df)sum(df$ClusterScore))

继续阅读：loops r

Avoiding Loop with R using Apply (?)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？