Sorting and finding values in other data frames
I have a dataframe named commodities_3
. It contains 28 columns with different commodities and 403 rows representing end-of-month data. What I need is to find the position for each row separately:
- max value,
- min value,
- all other positives
- all other negatives
Those index should then be used to locate the corresponding data in another dataframe with the same column and row characteristics called commodities_3_returns
. These data should then be copied into 4 new dataframes (one dataframe for each sorting).
I know how to find the positions of the values for each row using which and which.min and which.max. But I don't know how to put this in a loop in order to do it for all 403 rows. And subsequently how to use this data to locate the corresponding data in the other dataframe commodities_3_returns
.
Unfortunaltey I have to use a dataframe because I have dates as rownames in there, which I have to keep as I need them later for indexing, as well as NA's. It looks about like this:
commodities_3 &开发者_如何学运维lt;- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3) <- mydates
commodities_3[3,2] <- NA
commodities_3_returns <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3_returns) <- mydates
commodities_3_returns[3,3] <- NA
As I said, I have in total 403 rows and 27 columns. In every row, there are some NA's which I have to keep as well. max.col doesn't seem to be able to handle NA's.
My desired output for the above mentioned example would be sth like this:
max_values <- as.data.frame(matrix(data=c(1:5,3,2,1,3,1), nrow=5, ncol=2, byrow=F))
If all the columns in commodities_3
are numeric, then you want a matrix, not a data frame. Then use the apply
function. Some sample data, for reprodcubililty.
commodities_3 <- matrix(rnorm(12), nrow = 4)
commodities_3_returns <- matrix(1:12, nrow = 4)
The stats.
mins <- apply(commodities_3, 1, which.min)
maxs <- apply(commodities_3, 1, which.min)
pos <- apply(commodities_3, 1, function(x) which(x > 0)) #which is optional
neg <- apply(commodities_3, 1, function(x) which(x < 0))
Now use these in the index for commodities_3_returns
. In the absence of coffee, my brain has only a clunky solution with a for loop
n_months <- nrow(commodities_3_returns)
min_returns <- numeric(n_months)
for(i in seq_len(n_months))
{
min_returns[i] <- commodities_3_returns[i, mins[i]]
}
Here is an alternate approach to get the min
and max
using max.col
which is a C
function internally. If you have a large data set, max.col
works extremely fast compared to apply
based solutions
mins = max.col(-commodities_3)
maxs = max.col(commodities_3)
N = NROW(commodities_3)
commodities_3_returns[cbind(1:N, mins)] # returns min
commodities_3_returns[cbind(1:N, maxs)] # returns max
精彩评论