开发者

Create a new data frame column based on the values of another column

Let's say I have the following data frame.

dat <- data.frame(city=c("Chelsea","Brent","Bremen","Olathe","Lenexa","Shawnee"), 
        tag=c(rep("AlabamaCity",3), rep("KansasCity",3)))

I want to include a third column, Tag2, which will be the region that each state is in from the Tag column. So the first three cities will end up as 'South' and the last three will be 'Midwest'. The data will look like.

     city         tag      tag2
1 Chelsea AlabamaCity    South
2   Brent AlabamaCity    South
3  Bremen AlabamaCity    South
4  Olathe  KansasCity    Midwest
5  Lenexa  KansasCity    Midwest
6 Shawnee  KansasCity    Midwest

I tried the following commands, but it doesn't create a new column. Can anyone tell me what's wrong.

fixit <- function(dat) {
     for (i in 1:nrow(dat)) {
          Words = strsplit(as.character(dat[开发者_如何转开发i, 'tag']), " ")[[1]]
          if(any(Words == 'Alabama')) {
                dat[i, 'tag2'] <- "South"
          }
          if(any(Words == 'Kansas')) {
                dat[i, 'tag2'] <- "Midwest"
          }
     }
     return(dat)
}

Thanks for the help.


It isn't working because your strsplit() to create Words is wrong. (You do know how to debug R function's don't you?)

debug: Words = strsplit(as.character(dat[i, "tag"]), " ")[[1]]
Browse[2]> 
debug: if (any(Words == "Alabama")) {
    dat[i, "Tag2"] <- "South"
}
Browse[2]> Words
[1] "AlabamaCity"

at this point, Words is certainly not equal to "Alabama" or "Kansas" and will never be, so the if() clauses never get executed. R is returning dat, it is your function that is not altering dat.

This will do it for you, and is a bit more generic. First create a data frame holding the matched words with the regions

region <- data.frame(tag = c("Alabama","Kansas"), tag2 = c("South","Midwest"),
                     stringsAsFactors = FALSE)

The loop over the rows of this data frame, matching the "tag"s and inserting the appropriate "tag2"s:

for(i in seq_len(nrow(region))) {
    want <- grepl(region[i, "tag"], dat[, "tag"])
    dat[want, "tag2"] <- region[i, "tag2"]
}

Which will result in this:

> dat
     city         tag    tag2
1 Chelsea AlabamaCity   South
2   Brent AlabamaCity   South
3  Bremen AlabamaCity   South
4  Olathe  KansasCity Midwest
5  Lenexa  KansasCity Midwest
6 Shawnee  KansasCity Midwest

How does this work? The key bit is grepl(). If we do this for just one match, "Alabama", grepl() is used like this:

grepl("Alabama", dat[, "tag"])

and returns a logical indicating which of the "tag" elements matched the string "Alabama":

> grepl("Alabama", dat[, "tag"])
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜