How do I create a string data column that is a transformation of the strings in another column in R?
If I have this data set
Browser Count
Chrome/11 100
Chrome/11 89
Chrome/13 10
Safari/12 40
Safari/114 30
And I want to get a more general form of the browser without the version number.
Browser Clean_Browser 开发者_如何学编程 Count
Chrome/11 Chrome 100
Chrome/11 Chrome 89
Chrome/13 Chrome 10
Safari/12 Safari 40
Safari/114 Safari 30
I know this is easy to do with python or excel, but is there a way to do it in R so I don't have to pre-process the data?
That is pretty straightforward thanks to the regular expressions as well as string processing --- both are vectorised so you do not need to loop. You could use
gsub()
et al and replace '/...' with blankseven use
strsplit
with '/' as the split character and retain the firstcertainly other ways I can't think of now, and experience suggests several will involve packages by Hadley :) [kidding aside, look at the
stringr
package too]
Here is approach one, done on a vector but a column in a data.frame is just the same:
R> vec <- c( paste("Chrome", 11:13, sep="/"), paste("Safari", 101:102, sep="/"))
R> vec
[1] "Chrome/11" "Chrome/12" "Chrome/13" "Safari/101" "Safari/102"
R> newvec <- gsub("/.*$", "", vec, perl=TRUE)
R> newvec
[1] "Chrome" "Chrome" "Chrome" "Safari" "Safari"
R>
You can use colsplit
from reshape
package to do this.
df = read.table(textConnection(
"Browser Count
Chrome/11 100
Chrome/11 89
Chrome/13 10
Safari/12 40
Safari/114 30"), sep = "", header = TRUE)
require(reshape)
browser_version = colsplit(df$Browser, names = c('browser', 'version'), split = '[/]')
df = cbind(df, browser_version)
精彩评论