开发者

Case-insensitive search of a list in R

Can I search a character list for a string where I don't know how the string is cased? 开发者_如何学JAVA Or more generally, I'm trying to reference a column in a dataframe, but I don't know exactly how the columns are cased. My thought was to search names(myDataFrame) in a case-insensitive manner to return the proper casing of the column.


I would suggest the grep() function and some of its additional arguments that make it a pleasure to use.

grep("stringofinterest",names(dataframeofinterest),ignore.case=TRUE,value=TRUE)

without the argument value=TRUE you will only get a vector of index positions where the match occurred.


Assuming that there are no variable names which differ only in case, you can search your all-lowercase variable name in tolower(names(myDataFrame)):

match("b", tolower(c("A","B","C")))
[1] 2

This will produce only exact matches, but that is probably desirable in this case.


With the stringr package, you can modify the pattern with one of the built in modifier functions (see `?modifiers). For example since we are matching a fixed string (no special regular expression characters) but want to ignore case, we can do

str_detect(colnames(iris), fixed("species", ignore_case=TRUE))

Or you can use the (?i) case insensitive modifier

str_detect(colnames(iris), "(?i)species")


For anyone using this with %in%, simply use tolower on the right (or both) sides, like so:

"b" %in% c("a", "B", "c")
# [1] FALSE

tolower("b") %in% tolower(c("a", "B", "c"))
# [1] TRUE


The searchable package was created for allowing for various types of searching within objects:

l <- list( a=1, b=2, c=3 )
sl <- searchable(l)        # make the list "searchable"
sl <- ignore.case(sl)      # turn on case insensitivity

> sl['B']
$b
[1] 2

It works with lists and vectors and does a lot more than simple case-insensitive matching.


If you want to search for one set of strings in another set of strings, case insensitively, you could try:

s1 = c("a", "b")
s2 = c("B", "C")
matches = s1[ toupper(s1) %in% toupper(s2) ]


Another way of achieving this is to use str_which(string, pattern) from the stringr package:

library("stringr")
str_which(string = tolower(colnames(iris)), pattern = "species")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜