开发者

agrep function in R

I am trying to isolate the strings "24!!07!!10", "15!!08!!12", and "10!!08!!12" from the 4 lines of data below.

> z
                                                             LEGAL
1                                       开发者_C百科                 MAP #1166
2                        SE1/4 NE1/4 24!!07!!10 EX  MAP #106 42.13
3                      MAP 15!!08!!12 N1/2NW1/4 15!!8!!12 80.00 AC
4 BEG NW COR SAID SEC THEN E208' 10!!08!!12 NW1/4 EX TR AC 158.65~

Firstly, without the max.distance option the agrep function doesn't find any matches at all. Secondly, the option value=TRUE doesn't seem to give the actual values of the pattern matches and if indeed the output is the indices of the rows, the first row shouldn't really be a match at all.

> pattern <-"[0-99]-[0-99]-[0-99]"
> z1<-agrep(pattern ,z,ignore.case=TRUE, value=TRUE)
> z1
character(0)

> z1<-agrep(pattern,z,ignore.case=TRUE, value=TRUE, max.distance=22)
> z1
[1] "c(2, 4, 3, 1)"

I'd appreciate any help in figuring out what is going on.


@Kent is right about your regular expression not matching what you describe as your pattern. In addition, agrep is for fuzzy matching in the linguistic sense and does not take regular expressions. You are looking for grep or something in that family, probably regexpr.

Given your data

z <- c("MAP #1166", 
"SE1/4 NE1/4 24!!07!!10 EX  MAP #106 42.13", 
"MAP 15!!08!!12 N1/2NW1/4 15!!8!!12 80.00 AC", 
"BEG NW COR SAID SEC THEN E208' 10!!08!!12 NW1/4 EX TR AC 158.65~")

You can find the locations of the matches and extract them with

pattern <- "[0-9][0-9]!![0-9][0-9]!![0-9][0-9]"
locs <- regexpr(pattern, z)
substr(z, locs, locs+attr(locs,"match.length")-1)

If you want to use the other form of the regular expression, you can. You just need to double escape the backslashes in the string literal.

pattern <- "\\d{2}!!\\d{2}!!\\d{2}"


don't know R, but your pattern may be not correct.

how about "\d{2}!!\d{2}!!\d{2}" or

"[0-9][0-9]!![0-9][0-9]!![0-9][0-9]"

?


Is suspect agrep in R doesn't support that kind of pattern. Anyway, you should probably use grep instead:

z1 <- grep("\\d{2}!!\\d{2}!!\\d{2}", z, value=TRUE)


Another solution is to use to try the stringr package

require(stringr)

pattern <- "\\d{2}!!\\d{2}!!\\d{2}"
str_extract_all(z, pattern)

and you get this :

[[1]]
character(0)

[[2]]
[1] "24!!07!!10"

[[3]]
[1] "15!!08!!12"

[[4]]
[1] "10!!08!!12"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜