开发者

R - Repetitions of an array in other array

From a dataframe I get a new array, sliced from a dataframe. I want to get the amount of times a certain repetition appears on it.

For example

main <- c(A,B,C,A,B,V,A,B,C,D,E)
p <- c(A,B,C)
q <- c(A,B)

someFunction(main,p)
2

someFunction(main,q)
3
开发者_Python百科

I've been messing around with rle but it counts every subrepetion also, undersirable.

Is there a quick solution I'm missing?


You can use one of the regular expression tools in R since this is really a pattern matching exercise, specifically gregexpr for this question. The p and q vectors represent the search pattern and main is where we want to search for those patterns. From the help page for gregexpr:

gregexpr returns a list of the same length as text each element of which is of 
the same form as the return value for regexpr, except that the starting positions 
of every (disjoint) match are given. 

So we can take the length of the first list returned by gregexpr which gives the starting positions of the matches. We'll first collapse the vectors and then do the searching:

someFunction <- function(haystack, needle) {
    haystack <- paste(haystack, collapse = "")
    needle <- paste(needle, collapse = "")
    out <- gregexpr(needle, haystack)
    out.length <- length(out[[1]])
    return(out.length)
}

> someFunction(main, p)
[1] 2
> someFunction(main, q)
[1] 3

Note - you also need to throw "" around your vector main, p, and q vectors unless you have variables A, B, C, et al defined.

main <- c("A","B","C","A","B","V","A","B","C","D","E")
p <- c("A","B","C")
q <- c("A","B")


I'm not sure if this is the best way, but you can simply do that work by:

f <- function(a,b) 
  if (length(a) > length(b)) 0 
  else all(head(b, length(a)) == a) + Recall(a, tail(b, -1))

Someone may or may not find a built-in function.


Using sapply:

find_x_in_y <- function(x, y){
  sum(sapply(
      seq_len(length(y)-length(x)),
      function(i)as.numeric(all(y[i:(i+length(x)-1)]==x))
  ))
}


find_x_in_y(c("A", "B", "C"), main)
[1] 2

find_x_in_y(c("A", "B"), main)
[1] 3


Here's a way to do it using embed(v,n), which returns a matrix of all n-length sub-sequences of vector v:

find_x_in_y <- function(x, y) 
                   sum( apply( embed( y, length(x)), 1, 
                                  identical, rev(x)))

> find_x_in_y(p, main)
[1] 2
> find_x_in_y(q, main)
[1] 3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜