开发者

for loop specific elements of a vector in R

I want to run a for loop which makes a calculation only for specific elements from a column in a data frame. The elements are referenced from an adjacent column in the matrix. I can do this by visually observing which elements correspond to the values - e.g. for(i in 1:5){ # in a column of 301 elements. However, I would like to be able to specify this without apriori knowledge of the element numbers.

e.g. in the following data frame I want to run a for loop on the elements of the column data.LICOR$flux when data.LICOR$day.night=='d'

   data.LICOR.day.night data.LICOR.flux
1                   d       26.89
2                   d       27.89
3                   d       28.77
4                   d       28.92
5                   d       29.30
6                   n       28.51
7                   n       28.98
8                   n       28.41
9                   n       27.87
10                  n       28.18

This is what my previous code did by specifying element 1:5 and 开发者_高级运维6:10 which correspond to day.night = 'd' and day.night ='n' respectively

# replace day fluxes
for(i in 1:5){
    if(data.LICOR$flux[i] > av.day.flux+2*sd.day.flux)
      data.LICOR$flux[i] <- av.day.flux
    else if(data.LICOR$flux[i] < av.day.flux-2*sd.day.flux)
      data.LICOR$flux[i] <- av.day.flux 
}

# replace night fluxes
for(i in 6:10){
    if(data.LICOR$flux[i] > av.night.flux+2*sd.night.flux)
      data.LICOR$flux[i] <- av.night.flux
    else if(data.LICOR$flux[i] < av.night.flux-2*sd.night.flux) 
      data.LICOR$flux[i] <- av.night.flux 
}

this removes values that are greater than 2 standard deviations from the mean and replaces them with the mean value.

Thanks for any suggestions.


Given the loops in your comment to Gavin's answer, I think you want something like this (assuming your example data are in an object named data.LICOR).

# within() allows us to evaluate all the expressions (the 2nd argument)
# using the data in 'data.LICOR'.
data.LICOR <- within(data.LICOR, {
  # ave() applies 'FUN' to the subsets of 'flux' specified by 'day.night'
  # and returns an object the same length as 'flux'.
  av.flux <- ave(flux, day.night, FUN=mean);
  sd.flux <- ave(flux, day.night, FUN=sd);
  # ifelse() returns 'av.flux' when the first argument is TRUE
  # and 'flux' when it's FALSE.
  flux <- ifelse(flux > av.flux+2*sd.flux |
                 flux < av.flux-2*sd.flux, av.flux, flux) })


You can get the subset using:

subset(data.YSI, subset = day.night == "d")

But now tell us what you want to do with it?


Use the subset function to get the subset of the data-frame you want. E.g. if df is your original data-frame, you can do:

df.d <- subset(df, day.night == 'd')

And then you can do whatever calc you want on df.d. To replace the "large" and "small" values by their mean, you could do it in base R using Joshua's approach (using ave and within) or using the ddply function from the plyr package:

require(plyr)
## ddply breaks up the data-frame according to the value of "DayNight" ('d' or 'n') and 
## WITHIN each category, "transforms" the flux column as desired
ddply(df, .(DayNight), transform,
      flux = ifelse( flux > mean(flux) + 2*sd(flux) | flux < mean(flux) - 2*sd(flux),
                     mean(flux),
                     flux ))

I'm using shortened versions of your column names, but I'm sure you get the idea. If you know the "right" level of the average and std deviation a priori, you can replace the mean(flux) and sd(flux) in the above code, with those values.


Another possibility for the last two lines of the answer provided by Ulrich is:

 flux[which(flux> av.flux+2*sd.flux & flux < av.flux-2*sd.flux)] <- av.flux

This is, arguably, faster, because you only need to replace those values which do not fit the criteria. However, I'm not sure if this is truly faster in R, and for small datasets I guess it won't matter.

And kepping the loop (not a good practice in R if it's possible to vectorize, but may be in other languages), you just need to do this:

for (i in unique(data.Licor$day.night) { 
      if (data.LICOR$flux[which(data.LICOR$day.night==i)] > ...

Note here that the modifications are: the use of which and use of factors in the for statement.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜