Kruskal-Wallis test with details on pairwise comparisons
The standard stats::kruskal.test module allows to calculate the kruskal-wallis test on a dataset:
>>> data(diamonds)
>>> kruskal.test(price~carat, data=开发者_运维知识库diamonds)
Kruskal-Wallis rank sum test
data: price by carat by color
Kruskal-Wallis chi-squared = 50570.15, df = 272, p-value < 2.2e-16
This is correct, it is giving me a probability that all the groups in the data have the same mean.
However, I would like to have the details for each pair comparison, like if diamonds of colors D and E have the same mean price, as some other softwares do (SPSS) when you ask for a Kruskal test.
I have found kruskalmc from the package pgirmess which allows me to do what I want to do:
> kruskalmc(diamonds$price, diamonds$color)
Multiple comparison test after Kruskal-Wallis
p.value: 0.05
Comparisons
obs.dif critical.dif difference
D-E 571.7459 747.4962 FALSE
D-F 2237.4309 751.5684 TRUE
D-G 2643.1778 726.9854 TRUE
D-H 4539.4392 774.4809 TRUE
D-I 6002.6286 862.0150 TRUE
D-J 8077.2871 1061.7451 TRUE
E-F 2809.1767 680.4144 TRUE
E-G 3214.9237 653.1587 TRUE
E-H 5111.1851 705.6410 TRUE
E-I 6574.3744 800.7362 TRUE
E-J 8649.0330 1012.6260 TRUE
F-G 405.7470 657.8152 FALSE
F-H 2302.0083 709.9533 TRUE
F-I 3765.1977 804.5390 TRUE
F-J 5839.8562 1015.6357 TRUE
G-H 1896.2614 683.8760 TRUE
G-I 3359.4507 781.6237 TRUE
G-J 5434.1093 997.5813 TRUE
H-I 1463.1894 825.9834 TRUE
H-J 3537.8479 1032.7058 TRUE
I-J 2074.6585 1099.8776 TRUE
However, this package only allows for one categoric variable (e.g. I can't study the prices clustered by color and by carat, as I can do with kruskal.test), and I don't know anything about the pgirmess package, whether it is maintained or not, or if it is tested.
Can you recommend me a package to execute the Kruskal-Wallis test which returns details for every comparison? How would you handle the problem?
One other approach besides kruskal::agricolae mentioned by Marek, is the Nemenyi-Damico-Wolfe-Dunn test implemented in the help page for oneway_test in the coin package that uses multcomp. Using hadley's setup and reducing the B= value for the approximate() function so it finishes in finite time:
#updated translation of help page implementation of NDWD
NDWD <-
independence_test(dv ~ iv, data = sum_codings1, distribution = approximate(B = 10000),
ytrafo = function(data) trafo(data, numeric_trafo = rank_trafo),
xtrafo = mcp_trafo(iv = "Tukey"))
### global p-value
print(pvalue(NDWD))
### sites (I = II) != (III = IV) at alpha = 0.01 (page 244)
print(pvalue(NDWD, method = "single-step"))
More stable results on that larger dataset may require increasing the B value and increasing the user's patience.
Jan: 2012: There was recently a posting on R-help claiming unexpected results from this method so I forwarded that email to the maintainer. Mark Difford said he had confirmed the problems and offered an alternate tests with the nparcomp package: https://stat.ethz.ch/pipermail/r-help/2012-January/300100.html
There were also in the same week a couple of other suggestions on rhelp for post-hoc contrasts to KW tests:
kruskalmc suggested by Mario Garrido Escudero and
rms::polr
followed by rms::contrasts
suggested by Frank Harrell https://stat.ethz.ch/pipermail/r-help/2012-January/300329.html
Nov 2015: Agree with toto_tico that help page code of coin package has been changed in the intervening years. The ?independence_test
help page now offers a multivariate-KW test and the ?oneway_test
help page has replace its earlier implementation with the code above usng the independence_test
function.
You can use PMCMR package. There is more information about it.
Spelling_Grades <- c(90,87,89,90,75,88,97,99,78,85,72,76,77,79,70)
Methods <- c("A","A","A","A","B","B","B","B","B","B","C","C","C","C","C")
kruskalmc(Spelling_Grades~Methods)
#This method doesn't accept characters that's why I've changed the methods to integer
Methods <- c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3)
posthoc.kruskal.nemenyi.test(Spelling_Grades~Methods)
The two methods above give same results.
I would have thought you'd be able to do the following:
data(diamonds, package = "ggplot2")
library(coin)
library(multcomp)
kt <- kruskal_test(price ~ clarity, data = diamonds)
glht(kt, mcp(clarity = "Tukey"))
But it seems like multcomp
doesn't support coin
objects (yet?).
Unfortunately I don't know of a function like this. If there isn't one already, it would be an interesting task to construct a function that returns a matrix with all pairwise treatment comparisons. A contrast is considered significant if the following inequality is satisfied
(source: statsdirect.com)
where T is the Kruskal-Wallis test statistic for k samples, S^2 is the denominator of the T statistic, N is the total number (all ni) and Ri is the sum of the ranks (from all samples pooled) for the ith sample, and t is a quantile from the Student t distribution on N-k degrees of freedom.
I know I didn't help much :)
I am also waiting for a better answer
精彩评论