Remove rows of a dataframe that match a factor level (and then plot the data excluding that factor level)
I have a data frame with 251 observations and 45 variables. There are 6 observations in the middle of the data frame that i'd like to exclude from my analyses. All 6 belong to one level of a factor. It is easy to generate a new data frame that, when printed, appears to exclude the 6 observations. When I use the new data frame to plot variables by the factor in question, however, the supposedly excluded level is still included in the plot (sans observations). Using str() confirms that the level is still present in some form. Also, the index for the new data frame skips 6 values where the observations formerly resided.
How can I create a new data frame that excludes the 6 observations and does not continue to recognize the excluded factor level when plotting? Can the new data frame be made to "re-index", so that the new index does not skip values formerly assigned to the excluded factor level?
I've provided an example with made up data:
# ---------------------------------------------
# data
char <- c( rep("anc", 4), rep("nam", 3), rep("oom", 5), rep("apt", 3) )
a <- 1:15 / pi
b <- seq(1, 8, .5)
d <- rep(c(3, 8, 5), 5)
dat <- data.frame(char, a, b, d)
dat
# two ways to remove rows that contai开发者_开发知识库n a string
datNew1 <- dat[-which(dat$char == "nam"), ]
datNew1
datNew2 <- dat[grep("nam", dat[ ,"char"], invert=TRUE), ]
datNew2
# plots still contain the factor level that was excluded
boxplot(datNew1$a ~ datNew1$char)
boxplot(datNew2$a ~ datNew2$char)
# str confirms that it's still there
str(datNew1)
str(datNew2)
# ---------------------------------------------
You can use the drop.levels()
function from the gdata package to reduce the factor levels down to the actually used ones -- apply it on your column after you created the new data.frame
.
Also try a search for r and drop.levels here (but you need to make the search term [r] drop.levels
which I can't here as it interferes with the formatting logic).
Starting with R version 2.12.0, there is a function droplevels
, which can be applied either to factor columns or to the entire dataframe. When applied to the dataframe, it will remove zero-count levels from all factor columns. So your example will become simply:
# two ways to remove rows that contain a string
datNew1 <- droplevels( dat[-which(dat$char == "nam"), ] )
datNew2 <- droplevels( dat[grep("nam", dat[ ,"char"], invert=TRUE), ] )
I have pasted something from my code- I have an enclosure experiment in a lake- have measurements from enclosures and the lake but mostly dont want to deal with lake: my variable is called "t.level" and the levels were control, low medium high and lake- -this code makes it possible to use the nolk$ or data=nolk to get data without the "lake"..
nolk<-subset(mylakedata,t.level == "control" |
t.level == "low" |
t.level == "medium" |
t.level=="high")
nolk[]<-lapply(nolk, function(t.level) if(is.factor(t.level))
t.level[drop=T]
else t.level)
精彩评论