Find largest 5 value less than 1, lowest 5 values
I have a large correlatio开发者_如何学运维n matrix result in R - for now about 30 items correlated against each other - so the array has about 10,000 cells. I want to find the largest 5 and smallest 5 results. How can I do this?
Here's what a very small portion - the upper left corner - looks like:
PL1 V3 V4 V5
PL1 1.00000000 0.19905701 -0.02994034 -0.1533846
V3 0.19905701 1.00000000 0.09036472 0.1306054
V4 -0.02994034 0.09036472 1.00000000 0.1848030
V5 -0.15338465 0.13060539 0.18480296 1.0000000
The values in the table are always between 1 & -1 and if it helps, being a correlation matrix the upper half above the diagonal is a duplicate of the lower half below the diagonal.
I need the most positive 5 less than 1 and the most negative 5 including -1 if it exists.
Thanks in advance.
Here is another crude way to do this (no doubt there is a much easier way), but it's not too hard to wrap this in a function:
EDIT: Shortened the code.
# Simulate correlation matrix (taken from Patrick's answer)
set.seed(1)
n<-100
x<-matrix(runif(n^2),n,n)
cor<-cor(x)
# Set diagonal and one triangle to to 0:
diag(cor) <- 0
cor[upper.tri(cor)] <- 0
# Get sorted values:
sort <- sort(cor)
# Create a dummy matrix and get lowest 5:
min <- matrix(cor %in% sort[1:5] ,n,n)
which(min,arr.ind=T)
# Same for highest 5:
max <- matrix(cor %in% sort[(n^2-5):(n^2)] ,n,n)
which(max,arr.ind=T)
Another option, as ulidtko sayed, is to make a graph. You could try my package, called qgraph
, which can be used to visualize a correlation matrix as a network:
library(qgraph)
qgraph(cor(x),vsize=2,minimum=0.2,filetype="png")
You want to find the largest and smallest correlations and probably know not only what, but where those values came from. It's easy.
x<-matrix(runif(25),5,5)
cor<-cor(x)
l <- length(cor)
l1 <- length(cor[cor<1])
#the actual high and low correlation indexes
corHigh <- order(cor)[(l1-4):l1]
corLow <- order(cor)[1:5]
#(if you just want to view the correlations cor[corLow] or cor[corHigh] works fine)
#isolate them in the matrix so you can see where they came from easily
corHighView <- cor
corHighView[!1:l %in% corHigh] <- NA
corLowView <- cor
corLowView[!1:l %in% corLow] <- NA
#look at your matrix with your target correlations sticking out like a sore thumb
corLowView
corHighView
Interesting network graph Sacha. Here it is with real data. Seems I have much stronger positive than negative correlations.
kind of dirty:
x<-matrix(runif(25),5,5)
cor<-cor(x)
max1<-max(cor)
max2<-max(cor[cor!=max1])
max3<-max(cor[cor!=max1 & cor!=max2])
max4<-max(cor[cor!=max1& cor!=max2& cor!=max3])
max5<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4])
max6<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4& cor!=max5])
maxes<-c(max2,max3,max4,max5,max6)
maxes
matrix(cor %in% maxes,5,5)
How about a nice creamy plot? :)
> m <- matrix(runif(100)*2-1, ncol=10)
> colnames(m) <- rownames(m) <- paste("V", 1:10, sep="")
> m
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
V1 -0.40101571 -0.27049070 0.2414295 -0.1889384 0.6459941 -0.8851884 0.332284597 -0.431312791 0.3828374 0.46398193
V2 0.38557771 0.37083911 -0.3004923 0.1253908 -0.4405188 -0.5424613 0.869493425 0.023291914 0.9625392 -0.83196773
V3 0.61923503 -0.27615909 0.1759168 -0.7333568 -0.4256801 -0.6170807 0.438613391 -0.003632086 0.4113488 -0.40590330
V4 0.72093123 0.68479573 0.5032486 0.3720876 -0.6775834 0.2445693 0.353658359 -0.839104640 -0.8122970 -0.42322187
V5 -0.08640529 0.04432795 -0.5120129 -0.9327905 -0.5821378 0.4671473 -0.367677007 0.483375219 -0.7849003 0.57686729
V6 -0.72451704 0.75814550 0.7838393 -0.7650238 0.6742669 0.2260757 0.001645839 0.570753074 0.1944579 0.07917656
V7 0.64516271 0.51994540 0.9057388 -0.3976167 -0.7403159 -0.2873382 -0.809354444 0.319095368 -0.9766422 -0.71981321
V8 -0.51509049 0.18727837 -0.1971454 -0.4290346 0.5657622 0.5324266 0.451608266 -0.715594335 -0.2749510 0.38234855
V9 0.49035803 0.50252397 0.7736783 0.3342899 -0.2732427 0.1128947 0.870315070 -0.291482237 0.5171181 -0.59784449
V10 -0.51811224 -0.67159723 0.8903813 -0.7562222 -0.9790557 -0.5830560 -0.715136643 0.167987391 -0.0529399 0.44570592
> library(ggplot2)
> p <- ggplot(data=melt(m), aes(x=X1, y=X2, color=value))
> p + geom_point(size=5, alpha=0.7) + scale_color_gradient2()
I don't think it would be hard to look at 100x100 plot and find extreme values with an eye. :)
I take no credit for this, just posting the code in case the link dies... Credit to Dimitris on the r-help list. It returns a list of p top correlations involving each variable, sorted.
cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100))
p <- 30 # how many top items
n <- ncol(cor.mat)
cmat <- col(cor.mat)
ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n)
dim(ind) <- dim(cor.mat)
ind <- ind[seq(2, p + 1), ]
out <- cbind(ID = c(col(ind)), ID2 = c(ind))
as.data.frame(cbind(out,cor = cor.mat[out]))
精彩评论