开发者

Find largest 5 value less than 1, lowest 5 values

I have a large correlatio开发者_如何学运维n matrix result in R - for now about 30 items correlated against each other - so the array has about 10,000 cells. I want to find the largest 5 and smallest 5 results. How can I do this?

Here's what a very small portion - the upper left corner - looks like:

               PL1         V3          V4         V5
PL1     1.00000000 0.19905701 -0.02994034 -0.1533846
V3      0.19905701 1.00000000  0.09036472  0.1306054
V4     -0.02994034 0.09036472  1.00000000  0.1848030
V5     -0.15338465 0.13060539  0.18480296  1.0000000

The values in the table are always between 1 & -1 and if it helps, being a correlation matrix the upper half above the diagonal is a duplicate of the lower half below the diagonal.

I need the most positive 5 less than 1 and the most negative 5 including -1 if it exists.

Thanks in advance.


Here is another crude way to do this (no doubt there is a much easier way), but it's not too hard to wrap this in a function:

EDIT: Shortened the code.

 # Simulate correlation matrix (taken from Patrick's answer)
set.seed(1)
n<-100
x<-matrix(runif(n^2),n,n)
cor<-cor(x)

# Set diagonal and one triangle to to 0:
diag(cor) <- 0
cor[upper.tri(cor)] <- 0

# Get sorted values:
sort <- sort(cor)

# Create a dummy matrix and get lowest 5:
min <- matrix(cor %in% sort[1:5] ,n,n)
which(min,arr.ind=T)

# Same for highest 5:
max <- matrix(cor %in% sort[(n^2-5):(n^2)] ,n,n)
which(max,arr.ind=T)

Another option, as ulidtko sayed, is to make a graph. You could try my package, called qgraph, which can be used to visualize a correlation matrix as a network:

library(qgraph)
qgraph(cor(x),vsize=2,minimum=0.2,filetype="png")

Find largest 5 value less than 1, lowest 5 values


You want to find the largest and smallest correlations and probably know not only what, but where those values came from. It's easy.

x<-matrix(runif(25),5,5)
cor<-cor(x)
l <- length(cor)
l1 <- length(cor[cor<1])

#the actual high and low correlation indexes 
corHigh <- order(cor)[(l1-4):l1]
corLow <- order(cor)[1:5]
#(if you just want to view the correlations cor[corLow] or cor[corHigh] works fine)

#isolate them in the matrix so you can see where they came from easily
corHighView <- cor
corHighView[!1:l %in% corHigh] <- NA
corLowView <- cor
corLowView[!1:l %in% corLow] <- NA

#look at your matrix with your target correlations sticking out like a sore thumb
corLowView
corHighView


Interesting network graph Sacha. Here it is with real data. Seems I have much stronger positive than negative correlations.

Find largest 5 value less than 1, lowest 5 values


kind of dirty:

x<-matrix(runif(25),5,5)
cor<-cor(x)
max1<-max(cor)
max2<-max(cor[cor!=max1])
max3<-max(cor[cor!=max1 & cor!=max2])
max4<-max(cor[cor!=max1& cor!=max2& cor!=max3])
max5<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4])
max6<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4& cor!=max5])
maxes<-c(max2,max3,max4,max5,max6)
maxes
matrix(cor %in% maxes,5,5)


How about a nice creamy plot? :)

> m <- matrix(runif(100)*2-1, ncol=10)
> colnames(m) <- rownames(m) <- paste("V", 1:10, sep="")
> m
             V1          V2         V3         V4         V5         V6           V7           V8         V9         V10
V1  -0.40101571 -0.27049070  0.2414295 -0.1889384  0.6459941 -0.8851884  0.332284597 -0.431312791  0.3828374  0.46398193
V2   0.38557771  0.37083911 -0.3004923  0.1253908 -0.4405188 -0.5424613  0.869493425  0.023291914  0.9625392 -0.83196773
V3   0.61923503 -0.27615909  0.1759168 -0.7333568 -0.4256801 -0.6170807  0.438613391 -0.003632086  0.4113488 -0.40590330
V4   0.72093123  0.68479573  0.5032486  0.3720876 -0.6775834  0.2445693  0.353658359 -0.839104640 -0.8122970 -0.42322187
V5  -0.08640529  0.04432795 -0.5120129 -0.9327905 -0.5821378  0.4671473 -0.367677007  0.483375219 -0.7849003  0.57686729
V6  -0.72451704  0.75814550  0.7838393 -0.7650238  0.6742669  0.2260757  0.001645839  0.570753074  0.1944579  0.07917656
V7   0.64516271  0.51994540  0.9057388 -0.3976167 -0.7403159 -0.2873382 -0.809354444  0.319095368 -0.9766422 -0.71981321
V8  -0.51509049  0.18727837 -0.1971454 -0.4290346  0.5657622  0.5324266  0.451608266 -0.715594335 -0.2749510  0.38234855
V9   0.49035803  0.50252397  0.7736783  0.3342899 -0.2732427  0.1128947  0.870315070 -0.291482237  0.5171181 -0.59784449
V10 -0.51811224 -0.67159723  0.8903813 -0.7562222 -0.9790557 -0.5830560 -0.715136643  0.167987391 -0.0529399  0.44570592

> library(ggplot2)
> p <- ggplot(data=melt(m), aes(x=X1, y=X2, color=value))
> p + geom_point(size=5, alpha=0.7) + scale_color_gradient2()

Find largest 5 value less than 1, lowest 5 values

I don't think it would be hard to look at 100x100 plot and find extreme values with an eye. :)


I take no credit for this, just posting the code in case the link dies... Credit to Dimitris on the r-help list. It returns a list of p top correlations involving each variable, sorted.

cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100)) 
p <- 30 # how many top items
n <- ncol(cor.mat)
cmat <- col(cor.mat)
ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) <- dim(cor.mat)
ind <- ind[seq(2, p + 1), ]
out <- cbind(ID = c(col(ind)), ID2 = c(ind)) 
as.data.frame(cbind(out,cor = cor.mat[out]))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜