Split a string vector at whitespace
I have the following vector:
tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2",
"1530 1", 开发者_StackOverflow"1540 2", "1540 1")
I would like to just retain the second number in each of the atoms of this vector, so it would read:
c(2,1,2,1,2,1,2,1,2,1)
There's probably a better way, but here are two approaches with strsplit()
:
as.numeric(data.frame(strsplit(tmp3, " "))[2,])
as.numeric(lapply(strsplit(tmp3," "), function(x) x[2]))
The as.numeric() may not be necessary if you can use characters...
One could use read.table
on textConnection
:
X <- read.table(textConnection(tmp3))
then
> str(X)
'data.frame': 10 obs. of 2 variables:
$ V1: int 1500 1500 1510 1510 1520 1520 1530 1530 1540 1540
$ V2: int 2 1 2 1 2 1 2 1 2 1
so X$V2
is what you need.
It depends a little bit on how closely your actual data matches the example data you've given. I you're just trying to get everything after the space, you can use gsub
:
gsub(".+\\s+", "", tmp3)
[1] "2" "1" "2" "1" "2" "1" "2" "1" "2" "1"
If you're trying to implement a rule more complicated than "take everything after the space", you'll need a more complicated regular expresion.
What I think is the most elegant way to do this
> res <- sapply(strsplit(tmp3, " "), "[[", 2)
If you need it to be an integer
> storage.mode(res) <- "integer"
substr(x = tmp3, start = 6, stop = 6)
So long as your strings are always the same length, this should do the trick.
(And, of course, you don't have to specify the argument names - substr(tmp3, 6, 6)
works fine, too)
This should do it:
library(plyr)
ldply(strsplit(tmp3, split = " "))[[2]]
If you need a numeric vector, use
as.numeric(ldply(strsplit(tmp3, split = " "))[[2]])
Another option is scan()
. To get the second value, we can use a logical subset.
scan(text = tmp3)[c(FALSE, TRUE)]
# [1] 2 1 2 1 2 1 2 1 2 1
Just to add two more options - using stringr::str_split()
or data.table::tstrsplit()
1) using stringr::str_split()
# data posted above by the asker
tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2",
"1530 1", "1540 2", "1540 1")
library(stringr)
as.integer(
str_split(string = tmp3,
pattern = "[[:space:]]",
simplify = TRUE)[, 2]
)
#> [1] 2 1 2 1 2 1 2 1 2 1
simplify = TRUE
tells str_split
to return a matrix, then we can index the matrix for the desired column, therefore, the [, 2]
part
2) Using data.table::tstrsplit()
library(data.table)
as.data.table(tmp3)[, tstrsplit(tmp3, split = "[[:space:]]", type.convert = TRUE)][, V2]
#> [1] 2 1 2 1 2 1 2 1 2 1
type.convert = TRUE
is responsible for the conversion to integer here, but use this with care for other datasets.
The indexing [, V2]
part has a similar reason as explained above for [, 2]
. Here it selects the second column of the returned data table object, which contains the values desired by the asker as integers.
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.0.0 magrittr_1.5 tools_4.0.0 htmltools_0.4.0
#> [5] yaml_2.2.1 Rcpp_1.0.4.6 stringi_1.4.6 rmarkdown_2.1
#> [9] highr_0.8 knitr_1.28 stringr_1.4.0 xfun_0.13
#> [13] digest_0.6.25 rlang_0.4.6 evaluate_0.14
Created on 2020-05-06 by the reprex package (v0.3.0)
An easier way to split 1 column into 2 columns via data.table
require(data.table)
data_ex = data.table( a = paste( sample(1:3, size=10, replace=TRUE),"-separate", sep="" ))
data_ex[, number:= unlist( strsplit(x=a, split="-") )[[1]], by=a]
data_ex[, word:= unlist( strsplit(x=a, split="-") )[[2]], by=a ]
精彩评论