开发者

grep at the beginning of the string with fixed =T in R?

How to grep with fixed=T, but only at the beginning of the string?

grep("a.", c("a.b", "cac", "sss", "ca.f"), fixed = T)
# 1 4

I would like to get only the first occurrence. [Edit: the string to match is not known in advance, and can be anything. "a." is just for the sake of example]

Thanks.

[Edit: I sort of solved it now, but any other ideas are highly welcome. I will accept as an answer any alternative solution.

s <- "a."
res <- grep(s, c("a.b", "cac", "sss", "ca.f"), fix开发者_开发问答ed = T, value = T)
res[substring(res, 1, nchar(s)) == s]

]


If you want to match an exact string (string 1) at the beginning of the string (string 2), then just subset your string 2 to be the same length as string 1 and use ==, should be fairly fast.


Actually, Greg -and you- have mentioned the cleanest solution already. I would even drop the grep altogether:

> name <- "a#"
> string <- c("a#b", "cac", "sss", "ca#f")
> string[substring(string, 1, nchar(name)) == name]
[1] "a#b"

But if you really insist on grep, you can use Dwins approach, or following mindboggling solution:

specialgrep <- function(x,y,...){
  grep(
    paste("^",
          gsub("([].^+?|[#\\-])","\\\\\\1",x)
          ,sep=""),
    y,...)
}
> specialgrep(name,string,value=T)
[1] "a#b"

It might be I forgot to include some characters in the gsub. Be sure you keep the ] symbol first and the - last in the characterset, otherwise you'll get errors. Or just forget about it, use your own solution. This one is just for fun's sake :-)


Do you want to use fixed=T because of the . in the pattern? In that case you can just escape the . this would work:

grep("^a\\.", c("a.b", "cac", "sss", "ca.f"))


If you only want the focus on the first two characters, then only present that much information to grep:

> grep("a.", substr(c("a.b", "cac", "sss", "ca.f"), 1,2) ,fixed=TRUE)
[1] 1

You could easily wrap it into a function:

> checktwo <- function (patt,vec) { grep(patt, substr(vec, 1,nchar(patt)) ,fixed=TRUE) }
> checktwo("a.", c("a.b", "cac", "sss", "ca.f") )
[1] 1


I think Dr. G had the key to the solution in his answer, but didn't explicitly call it out: "^" in the pattern specifies "at the beginning of the string". ("$" means at the end of the string)

So his "^a." pattern means "at the beginning of the string, look for an 'a' followed by one character of anything [the '.']".

Or you could just use "^a" as the pattern unless you don't want to match the one character string containing only "a".

Does that help?

Jeffrey

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜