Getting only matched part of the string in R
Is there a function in R that matches regexp and returns only the matched parts?
Something like grep -o
, so:
> ogrep('.b.',c('abc','开发者_开发知识库1b2b3b4'))
[[1]]
[1] abc
[[2]]
[1] 1b2 3b4
Try stringr
:
library(stringr)
str_extract_all(c('abc','1b2b3b4'), '.b.')
# [[1]]
# [1] "abc"
#
# [[2]]
# [1] "1b2" "3b4"
I can't believe nobody ever mentioned regmatches
!
x <- c('abc','1b2b3b4')
regmatches(x, gregexpr('.b.', x))
# [[1]]
# [1] "abc"
# [[2]]
# [1] "1b2" "3b4"
It makes me wonder, didn't regmatches
exist two and half years ago?
You should probably give Gabor Grothendieck the check for writing the gsubfn package:
require(gsubfn)
#Loading required package: gsubfn
strapply(c('abc','1b2b3b4'), ".b.", I)
#Loading required package: tcltk
#Loading Tcl/Tk interface ... done
[[1]]
[1] "abc"
[[2]]
[1] "1b2" "3b4"
This just applies the identity function , I, to the matches of the pattern.
You need to combine gregexpr with substring, I reckon:
> s = c('abc','1b2b3b4')
> m = gregexpr('.b.',s)
> substring(s[1],m[[1]],m[[1]]+attr(m[[1]],'match.length')-1)
[1] "abc"
> substring(s[2],m[[2]],m[[2]]+attr(m[[2]],'match.length')-1)
[1] "1b2" "3b4"
The returned list 'm' has the start and lengths of matches. Loop over s to get all the substrings.
精彩评论