开发者

Remove from a string all except selected characters

I want to remove from a string all characters that are not digits, minus signs, or decimal points.

I imported data from Excel using read.xls, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:

excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "",开发者_运维技巧 x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).


Short and sweet. Thanks to comment by G. Grothendieck.

gsub("[^-.0-9]", "", excel_coords)

From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."


Can also be done by using strsplit, sapply and paste and by indexing the correct characters rather than the wrong ones:

 excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 


gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154" 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜