Checking conditions and adding items to a data frame
I'm trying to develop a function which will allow me to input n开发者_如何学Pythonew elements to a data frame and then check if they contain certain words.
df <- data.frame(keyword=c("He drives a Honda", "He goes to Ohio State"),
car=c(1,0), school=c(0,1))
df
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
In this data frame, car and school are binary values which contain 1 if a word from the car/school vector is part of the keyword. If a word isn't present in the keyword, then 0 is assigned.
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
I want to use a function to input new keywords into the data frame, while iterating over the keywords for specific values from the car and school vectors.
main <- function(keyword){
n = strsplit(as.character(keyword), " ")[[1]]
for( i in keyword ){
if( any(n==car) ){
df$car <- c(1)
}
if( any(n==school )){
df$school <- c(1)
}
}
}
This function isn't complete and it produces the following error. Because the car and school vectors are of length 3, it seems to be producing an error.
> main("He likes Ford and goes to Ohio State")
Warning message:
In n == school :
longer object length is not a multiple of shorter object length
I'm also not sure how to add the 0/1 values to the df. For the "He likes Ford and goes to Ohio State" keyword, I should have 1 in both the car and school columns.
keyword car school
He drives a Honda 1 0
He goes to Ohio State 0 1
He likes Honda and goes to Ohio State 1 1
Please help.
It seems like the ifelse()
function would be really useful for this task, but I haven't been able to properly implement it.
I think the easiest way is to use a compound regular expression:
library(stringr)
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Ohio State", "Missouri")
car_match <- str_c(car, collapse = "|")
school_match <- str_c(school, collapse = "|")
df <- data.frame(keyword=c("He drives a Honda",
"He goes to Ohio State",
"He likes Ford and goes to Ohio State"))
main <- function(df) {
df$car <- str_detect(df$keyword, car_match)
df$school <- str_detect(df$keyword, school_match)
df
}
main(df)
Few minor problems, but easily fixed with a couple of %in%
. Also you need a special logical expression to account for 'Ohio State' which was tripping up strsplit
because of the space.
df <- data.frame(keyword=c("He drives a Honda",
"He goes to Ohio State",
"He likes Ford and goes to Ohio State"),
car=0, school=0)
main <- function(df) {
car <- c("Honda", "Chevy", "Toyota", "Ford")
school <- c("Michigan", "Missouri")
for (i in 1:nrow(df)) {
Words = strsplit(as.character(df[i, 'keyword']), " ")[[1]]
if(any(Words %in% car)) df[i, 'car'] <- 1
if(any(Words == 'Ohio')) {
if(Words[which(Words == 'Ohio') + 1] == 'State') df[i, 'school'] <- 1
}
if(any(Words %in% school)) df[i, 'school'] <- 1
}
return(df)
}
main(df)
keyword car school
1 He drives a Honda 1 0
2 He goes to Ohio State 0 1
3 He likes Ford and goes to Ohio State 1 1
Here's a version that I believe will work without having to specify every two-word search term by hand, as in the case of "Ohio State" in wkmor1's solution. The trick is to use grep
instead:
main <- function(str,df){
carSearch <- unlist(lapply(car,grep,x=str,fixed=TRUE))
schoolSearch <- unlist(lapply(school,grep,x=str,fixed=TRUE))
t1 <- length(carSearch) != 0
t2 <- length(schoolSearch) != 0
if (t1 | t2){
newRow <- data.frame(keyword=str,car=ifelse(t1,1,0),
school=ifelse(t2,1,0))
df <- rbind(df,newRow)
return(df)
}
}
精彩评论