开发者

Test for equality among all elements of a single numeric vector

I'm trying to test whether all elements of a vector are equal to one another. The solutions I have come up with seem somewhat roundabout, both involving checking length().

x <- c(1, 2, 3, 4, 5, 6, 1)  # FALSE
y <- rep(2, times = 7)       # TRUE

With unique():

length(unique(x)) == 1
length(unique(y)) == 1

With rle():

length(rle(x)$values) == 1
length(rle(y)$values) == 1

A solution that would let me include a tolerance value for assessing 'equality' among elements would be ideal to avoid FAQ 7.31 issues.

Is there a built-in function for type of test that I have completely overlooked? identical() and all.equal() compare two R objects, so they won't work here.

Edit 1

Here are some benchmarking results. Using the code:

library(rbenchmark)

John <- function() all( abs(x - mean(x)) < .Machine$doub开发者_如何学Pythonle.eps ^ 0.5 )
DWin <- function() {diff(range(x)) < .Machine$double.eps ^ 0.5}
zero_range <- function() {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = .Machine$double.eps ^ 0.5))
}

x <- runif(500000);

benchmark(John(), DWin(), zero_range(),
  columns=c("test", "replications", "elapsed", "relative"),
  order="relative", replications = 10000)

With the results:

          test replications elapsed relative
2       DWin()        10000 109.415 1.000000
3 zero_range()        10000 126.912 1.159914
1       John()        10000 208.463 1.905251

So it looks like diff(range(x)) < .Machine$double.eps ^ 0.5 is fastest.


Why not simply using the variance:

var(x) == 0

If all the elements of x are equal, you will get a variance of 0. This works only for double and integers though.

Edit based on the comments below:
A more generic option would be to check for the length of unique elements in the vector which must be 1 in this case. This has the advantage that it works with all classes beyond just double and integer from which variance can be calculated from.

length(unique(x)) == 1


If they're all numeric values then if tol is your tolerance then...

all( abs(y - mean(y)) < tol ) 

is the solution to your problem.

EDIT:

After looking at this, and other answers, and benchmarking a few things the following comes out over twice as fast as the DWin answer.

abs(max(x) - min(x)) < tol

This is a bit surprisingly faster than diff(range(x)) since diff shouldn't be much different than - and abs with two numbers. Requesting the range should optimize getting the minimum and maximum. Both diff and range are primitive functions. But the timing doesn't lie.

And, in addition, as @Waldi pointed out, abs is superfluous here.


I use this method, which compares the min and the max, after dividing by the mean:

# Determine if range of vector is FP 0.
zero_range <- function(x, tol = .Machine$double.eps ^ 0.5) {
  if (length(x) == 1) return(TRUE)
  x <- range(x) / mean(x)
  isTRUE(all.equal(x[1], x[2], tolerance = tol))
}

If you were using this more seriously, you'd probably want to remove missing values before computing the range and mean.


You can just check all(v==v[1])


> isTRUE(all.equal( max(y) ,min(y)) )
[1] TRUE
> isTRUE(all.equal( max(x) ,min(x)) )
[1] FALSE

Another along the same lines:

> diff(range(x)) < .Machine$double.eps ^ 0.5
[1] FALSE
> diff(range(y)) < .Machine$double.eps ^ 0.5
[1] TRUE


You can use identical() and all.equal() by comparing the first element to all others, effectively sweeping the comparison across:

R> compare <- function(v) all(sapply( as.list(v[-1]), 
+                         FUN=function(z) {identical(z, v[1])}))
R> compare(x)
[1] FALSE
R> compare(y)
[1] TRUE
R> 

That way you can add any epsilon to identical() as needed.


Since I keep coming back to this question over and over, here's an Rcpp solution that will generally be much much faster than any of the R solutions if the answer is actually FALSE (because it will stop the moment it encounters a mismatch) and will have the same speed as the fastest R solution if the answer is TRUE. For example for the OP benchmark, system.time clocks in at exactly 0 using this function.

library(inline)
library(Rcpp)

fast_equal = cxxfunction(signature(x = 'numeric', y = 'numeric'), '
  NumericVector var(x);
  double precision = as<double>(y);

  for (int i = 0, size = var.size(); i < size; ++i) {
    if (var[i] - var[0] > precision || var[0] - var[i] > precision)
      return Rcpp::wrap(false);
  }

  return Rcpp::wrap(true);
', plugin = 'Rcpp')

fast_equal(c(1,2,3), 0.1)
#[1] FALSE
fast_equal(c(1,2,3), 2)
#[2] TRUE


I wrote a function specifically for this, which can check not only elements in a vector, but also capable of checking if all elements in a list are identical. Of course it as well handle character vectors and all other types of vector well. It also has appropriate error handling.

all_identical <- function(x) {
  if (length(x) == 1L) {
    warning("'x' has a length of only 1")
    return(TRUE)
  } else if (length(x) == 0L) {
    warning("'x' has a length of 0")
    return(logical(0))
  } else {
    TF <- vapply(1:(length(x)-1),
                 function(n) identical(x[[n]], x[[n+1]]),
                 logical(1))
    if (all(TF)) TRUE else FALSE
  }
}

Now try some examples.

x <- c(1, 1, 1, NA, 1, 1, 1)
all_identical(x)       ## Return FALSE
all_identical(x[-4])   ## Return TRUE
y <- list(fac1 = factor(c("A", "B")),
          fac2 = factor(c("A", "B"), levels = c("B", "A"))
          )
all_identical(y)     ## Return FALSE as fac1 and fac2 have different level order


You do not actually need to use min, mean, or max. Based on John's answer:

all(abs(x - x[[1]]) < tolerance)


Here an alternative using the min, max trick but for a data frame. In the example I am comparing columns but the margin parameter from apply can be changed to 1 for rows.

valid = sum(!apply(your_dataframe, 2, function(x) diff(c(min(x), max(x)))) == 0)

If valid == 0 then all the elements are the same


Another solution which uses the data.table package, compatible with strings and NA is uniqueN(x) == 1

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜