for loop specific elements of a vector in R
I want to run a for loop which makes a calculation only for specific elements from a column in a data frame. The elements are referenced from an adjacent column in the matrix. I can do this by visually observing which elements correspond to the values - e.g. for(i in 1:5){ #
in a column of 301 elements. However, I would like to be able to specify this without apriori knowledge of the element numbers.
e.g. in the following data frame I want to run a for loop on the elements of the column data.LICOR$flux
when data.LICOR$day.night=='d'
data.LICOR.day.night data.LICOR.flux
1 d 26.89
2 d 27.89
3 d 28.77
4 d 28.92
5 d 29.30
6 n 28.51
7 n 28.98
8 n 28.41
9 n 27.87
10 n 28.18
This is what my previous code did by specifying element 1:5 and 开发者_高级运维6:10 which correspond to day.night = 'd' and day.night ='n' respectively
# replace day fluxes
for(i in 1:5){
if(data.LICOR$flux[i] > av.day.flux+2*sd.day.flux)
data.LICOR$flux[i] <- av.day.flux
else if(data.LICOR$flux[i] < av.day.flux-2*sd.day.flux)
data.LICOR$flux[i] <- av.day.flux
}
# replace night fluxes
for(i in 6:10){
if(data.LICOR$flux[i] > av.night.flux+2*sd.night.flux)
data.LICOR$flux[i] <- av.night.flux
else if(data.LICOR$flux[i] < av.night.flux-2*sd.night.flux)
data.LICOR$flux[i] <- av.night.flux
}
this removes values that are greater than 2 standard deviations from the mean and replaces them with the mean value.
Thanks for any suggestions.
Given the loops in your comment to Gavin's answer, I think you want something like this (assuming your example data are in an object named data.LICOR
).
# within() allows us to evaluate all the expressions (the 2nd argument)
# using the data in 'data.LICOR'.
data.LICOR <- within(data.LICOR, {
# ave() applies 'FUN' to the subsets of 'flux' specified by 'day.night'
# and returns an object the same length as 'flux'.
av.flux <- ave(flux, day.night, FUN=mean);
sd.flux <- ave(flux, day.night, FUN=sd);
# ifelse() returns 'av.flux' when the first argument is TRUE
# and 'flux' when it's FALSE.
flux <- ifelse(flux > av.flux+2*sd.flux |
flux < av.flux-2*sd.flux, av.flux, flux) })
You can get the subset using:
subset(data.YSI, subset = day.night == "d")
But now tell us what you want to do with it?
Use the subset
function to get the subset of the data-frame you want. E.g. if df
is your original data-frame, you can do:
df.d <- subset(df, day.night == 'd')
And then you can do whatever calc you want on df.d
.
To replace the "large" and "small" values by their mean, you could do it in base R using Joshua's approach (using ave
and within
) or using the ddply
function from the plyr
package:
require(plyr)
## ddply breaks up the data-frame according to the value of "DayNight" ('d' or 'n') and
## WITHIN each category, "transforms" the flux column as desired
ddply(df, .(DayNight), transform,
flux = ifelse( flux > mean(flux) + 2*sd(flux) | flux < mean(flux) - 2*sd(flux),
mean(flux),
flux ))
I'm using shortened versions of your column names, but I'm sure you get the idea. If you know the "right" level of the average and std deviation a priori, you can replace the mean(flux)
and sd(flux)
in the above code, with those values.
Another possibility for the last two lines of the answer provided by Ulrich is:
flux[which(flux> av.flux+2*sd.flux & flux < av.flux-2*sd.flux)] <- av.flux
This is, arguably, faster, because you only need to replace those values which do not fit the criteria. However, I'm not sure if this is truly faster in R, and for small datasets I guess it won't matter.
And kepping the loop (not a good practice in R if it's possible to vectorize, but may be in other languages), you just need to do this:
for (i in unique(data.Licor$day.night) {
if (data.LICOR$flux[which(data.LICOR$day.night==i)] > ...
Note here that the modifications are: the use of which and use of factors in the for statement.
精彩评论