How to read.table with "Hebrew" column names (in R)?
I am trying to read a .txt file, with Hebrew column names, but without success.
I uploaded an example file to: http://www.talgalili.com/files/aa.txt
And am trying the command:
read.table("http://www.talgalili.com/files/aa.txt", header = T, sep = "\t")
This returns me with:
X.....ª X...ª...... X...œ....
1 12 97 6
2 123 354 44
3 6 1 3
Instead of:
אחת שתיים שלוש
12 97 6
123 354 44
6 1 3
My output for:
l10n_info()
Is:
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] TRUE
$codepage
[1] 1252
And for:
Sys.getlocale()
Is:
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Can you suggest to me what to try and change to allow me to load the file correctly ?
Update: Trying to use:
read.table("http://www.talgalili.com/files/aa.txt",fileEncoding ="iso8859-8")
Has resulted in:
V1
1 ?
Warning messages:
1: In read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
invalid input found on input connection 'http://www.talgalili.com/files/aa.txt'
2: In开发者_StackOverflow社区 read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
incomplete final line found by readTableHeader on 'http://www.talgalili.com/files/aa.txt'
While also trying this:
Sys.setlocale("LC_ALL", "en_US.UTF-8")
Or this:
Sys.setlocale("LC_ALL", "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8")
Get's me this:
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
Finally, here is the > sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32
locale:
[1] LC_COLLATE=English_United States.1255 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.10.1
Any suggestion or clarification will be appreciated.
Best, Tal
I would try passing parameter fileEncoding
to read.table with a value of iso8859-8
.
Use iconvlist()
to get an alphabetical list of the supported encodings. As I saw here Hebrew must be part 8 of ISO 8859.
I've tried @George Donats answer, but couldn't make it work. So I wanted to suggest another possibility for future reference.
I couldn't find the file online, so I've recreated a txt file like your using TAB as a seperator. You can load it into R with the Hebrew text using a connection. It is demonstrated below:
con<-file("aa.txt",open="r",encoding="iso8859-8") ##Open a read-only connection with encoding fit for Hebrew (iso8859-8)
Than you can load it into R with your code, using con variable as the file input, code described here:
data<-read.table(con,sep="\t",header=TRUE)
Browsing into the data variable gives the following results:
str(data)
'data.frame': 3 obs. of 3 variables:
$ אחת : int 6 44 3
$ שתיים: int 97 354 1
$ שלוש : int 12 123 6
> data$אחת
[1] 6 44 3
精彩评论