Error in nchar() when reading in stata file in R on Mac
I'm learning R and am simply trying to read in a stata data file but am getting the error below:
X <- Stata.file(Stata_File)
Error in nchar(varlabs) : invalid multibyte string 253
Multiple Mac users here are encountering this error with the program but it works fine on a PC. A google search of this error seems to say it has something to do with the R package but I can't find a solution. Any ideas? Thanks for your help!!
The R code up to the error point is below:
Root <- "/Users/Desktop/R_Training"
PathIn <- paste(Root,"Data/Example_0",sep="/")
# The 2007 Dominican Republic household member file (96 MB)
Stata_File <- "drpr51fl.dta"
# Load the memisc package:
library(memisc)
# Set the working directory:
setwd(PathIn)
# (1) Determine which variables we want:
# The Stata.file function (from memisc) reads the "header"
# of our Stata file so you can see what it contains
# and choose the variables you want.
X <- Stata.file(Stata_File)
**Error in nchar(varlabs) : invalid multibyte string 253**
Below is my session info:
R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] grid stats graphics grDevices utils datasets [7] methods base
other attache开发者_如何学God packages: [1] memisc_0.95-33 MASS_7.3-13 lattice_0.19-30
This is what worked for me. You can force R to recognize every character by issuing the following command:
Sys.setlocale('LC_ALL','C')
Now run the previous command and all should be fine.
It seems like the encoding of strings in the file isn't what the program thinks it is... I guess the file was generated on a PC? Does it contain non-ACII column names or data strings?
Since you seem to have UTF-8 encoding, and (US/western europe) PC:s typically have latin-1, that could be the problem. I'd expect the same problem on Linux then (also UTF-8).
Possible work-arounds: Does the Stata.file method have an "encoding" option? Then you might try 'latin1' and hope for the best...
Another possibility is to start R with the --encoding=latin1 option.
精彩评论