agrep function in R
I am trying to isolate the strings "24!!07!!10", "15!!08!!12", and "10!!08!!12" from the 4 lines of data below.
> z
LEGAL
1 开发者_C百科 MAP #1166
2 SE1/4 NE1/4 24!!07!!10 EX MAP #106 42.13
3 MAP 15!!08!!12 N1/2NW1/4 15!!8!!12 80.00 AC
4 BEG NW COR SAID SEC THEN E208' 10!!08!!12 NW1/4 EX TR AC 158.65~
Firstly, without the max.distance option the agrep function doesn't find any matches at all. Secondly, the option value=TRUE doesn't seem to give the actual values of the pattern matches and if indeed the output is the indices of the rows, the first row shouldn't really be a match at all.
> pattern <-"[0-99]-[0-99]-[0-99]"
> z1<-agrep(pattern ,z,ignore.case=TRUE, value=TRUE)
> z1
character(0)
> z1<-agrep(pattern,z,ignore.case=TRUE, value=TRUE, max.distance=22)
> z1
[1] "c(2, 4, 3, 1)"
I'd appreciate any help in figuring out what is going on.
@Kent is right about your regular expression not matching what you describe as your pattern. In addition, agrep
is for fuzzy matching in the linguistic sense and does not take regular expressions. You are looking for grep
or something in that family, probably regexpr
.
Given your data
z <- c("MAP #1166",
"SE1/4 NE1/4 24!!07!!10 EX MAP #106 42.13",
"MAP 15!!08!!12 N1/2NW1/4 15!!8!!12 80.00 AC",
"BEG NW COR SAID SEC THEN E208' 10!!08!!12 NW1/4 EX TR AC 158.65~")
You can find the locations of the matches and extract them with
pattern <- "[0-9][0-9]!![0-9][0-9]!![0-9][0-9]"
locs <- regexpr(pattern, z)
substr(z, locs, locs+attr(locs,"match.length")-1)
If you want to use the other form of the regular expression, you can. You just need to double escape the backslashes in the string literal.
pattern <- "\\d{2}!!\\d{2}!!\\d{2}"
don't know R, but your pattern may be not correct.
how about "\d{2}!!\d{2}!!\d{2}"
or
"[0-9][0-9]!![0-9][0-9]!![0-9][0-9]"
?
Is suspect agrep
in R doesn't support that kind of pattern. Anyway, you should probably use grep
instead:
z1 <- grep("\\d{2}!!\\d{2}!!\\d{2}", z, value=TRUE)
Another solution is to use to try the stringr package
require(stringr)
pattern <- "\\d{2}!!\\d{2}!!\\d{2}"
str_extract_all(z, pattern)
and you get this :
[[1]]
character(0)
[[2]]
[1] "24!!07!!10"
[[3]]
[1] "15!!08!!12"
[[4]]
[1] "10!!08!!12"
精彩评论