开发者

Error in nchar() when reading in stata file in R on Mac

I'm learning R and am simply trying to read in a stata data file but am getting the error below:

X <- Stata.file(Stata_File)

Error in nchar(varlabs) : invalid multibyte string 253

Multiple Mac users here are encountering this error with the program but it works fine on a PC. A google search of this error seems to say it has something to do with the R package but I can't find a solution. Any ideas? Thanks for your help!!

The R code up to the error point is below:

Root   <- "/Users/Desktop/R_Training"
PathIn <- paste(Root,"Data/Example_0",sep="/")

# The 2007 Dominican Republic household member file (96 MB) 
Stata_File <- "drpr51fl.dta"

# Load the memisc package:
library(memisc)

# Set the working directory:
setwd(PathIn)

# (1) Determine which variables we want:
# The Stata.file function (from memisc) reads the "header" 
#  of our Stata file so you can see what it contains
#  and choose the variables you want.
X <- Stata.file(Stata_File)

**Error in nchar(varlabs) : invalid multibyte string 253**

Below is my session info:

R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid stats graphics grDevices utils datasets [7] methods base

other attache开发者_如何学God packages: [1] memisc_0.95-33 MASS_7.3-13 lattice_0.19-30


This is what worked for me. You can force R to recognize every character by issuing the following command:

Sys.setlocale('LC_ALL','C')

Now run the previous command and all should be fine.


It seems like the encoding of strings in the file isn't what the program thinks it is... I guess the file was generated on a PC? Does it contain non-ACII column names or data strings?

Since you seem to have UTF-8 encoding, and (US/western europe) PC:s typically have latin-1, that could be the problem. I'd expect the same problem on Linux then (also UTF-8).

Possible work-arounds: Does the Stata.file method have an "encoding" option? Then you might try 'latin1' and hope for the best...

Another possibility is to start R with the --encoding=latin1 option.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜