开发者

Sorting and finding values in other data frames

I have a dataframe named commodities_3. It contains 28 columns with different commodities and 403 rows representing end-of-month data. What I need is to find the position for each row separately:

  • max value,
  • min value,
  • all other positives
  • all other negatives

Those index should then be used to locate the corresponding data in another dataframe with the same column and row characteristics called commodities_3_returns. These data should then be copied into 4 new dataframes (one dataframe for each sorting).

I know how to find the positions of the values for each row using which and which.min and which.max. But I don't know how to put this in a loop in order to do it for all 403 rows. And subsequently how to use this data to locate the corresponding data in the other dataframe commodities_3_returns.


Unfortunaltey I have to use a dataframe because I have dates as rownames in there, which I have to keep as I need them later for indexing, as well as NA's. It looks about like this:

commodities_3 &开发者_如何学运维lt;- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3) <- mydates
commodities_3[3,2] <- NA


commodities_3_returns <- as.data.frame(matrix(rnorm(15), nrow=5, ncol=3))
mydates <- as.Date(c("2011-01-01", "2011-01-02", "2011-01-03", "2011-01-04", "2011-01-05"))
rownames(commodities_3_returns) <- mydates
commodities_3_returns[3,3] <- NA

As I said, I have in total 403 rows and 27 columns. In every row, there are some NA's which I have to keep as well. max.col doesn't seem to be able to handle NA's.

My desired output for the above mentioned example would be sth like this:

max_values <- as.data.frame(matrix(data=c(1:5,3,2,1,3,1), nrow=5, ncol=2, byrow=F))


If all the columns in commodities_3 are numeric, then you want a matrix, not a data frame. Then use the apply function. Some sample data, for reprodcubililty.

commodities_3 <- matrix(rnorm(12), nrow = 4)
commodities_3_returns <- matrix(1:12, nrow = 4)

The stats.

mins <- apply(commodities_3, 1, which.min)
maxs <- apply(commodities_3, 1, which.min)
pos <- apply(commodities_3, 1, function(x) which(x > 0))  #which is optional
neg <- apply(commodities_3, 1, function(x) which(x < 0))

Now use these in the index for commodities_3_returns. In the absence of coffee, my brain has only a clunky solution with a for loop

n_months <- nrow(commodities_3_returns)
min_returns <- numeric(n_months)
for(i in seq_len(n_months))
{
  min_returns[i] <- commodities_3_returns[i, mins[i]]
}


Here is an alternate approach to get the min and max using max.col which is a C function internally. If you have a large data set, max.col works extremely fast compared to apply based solutions

mins = max.col(-commodities_3)
maxs = max.col(commodities_3)
N    = NROW(commodities_3)

commodities_3_returns[cbind(1:N, mins)] # returns min
commodities_3_returns[cbind(1:N, maxs)] # returns max
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜