How to read.table with "Hebrew" column names (in R)?

2022-12-23 14:38 问答作者：

I am trying to read a .txt file, with Hebrew column names, but without success.

I uploaded an example file to: http://www.talgalili.com/files/aa.txt

And am trying the command:

read.table("http://www.talgalili.com/files/aa.txt", header = T, sep = "\t")

This returns me with:

  X.....ª X...ª...... X...œ....
1      12          97         6
2     123         354        44
3       6           1         3

Instead of:

אחת שתיים   שלוש
12  97  6
123 354 44
6   1   3

My output for:

l10n_info()

Is:

$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

And for:

Sys.getlocale()

Is:

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Can you suggest to me what to try and change to allow me to load the file correctly ?

Update: Trying to use:

read.table("http://www.talgalili.com/files/aa.txt",fileEncoding ="iso8859-8")

Has resulted in:

 V1
1  ?
Warning messages:
1: In read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
  invalid input found on input connection 'http://www.talgalili.com/files/aa.txt'
2: In开发者_StackOverflow社区 read.table("http://www.talgalili.com/files/aa.txt", fileEncoding = "iso8859-8") :
  incomplete final line found by readTableHeader on 'http://www.talgalili.com/files/aa.txt'

While also trying this:

Sys.setlocale("LC_ALL", "en_US.UTF-8")

Or this:

Sys.setlocale("LC_ALL", "en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8")

Get's me this:

[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored

Finally, here is the > sessionInfo()

R version 2.10.1 (2009-12-14) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1255  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.10.1

Any suggestion or clarification will be appreciated.

Best, Tal

I would try passing parameter fileEncoding to read.table with a value of iso8859-8.

Use iconvlist() to get an alphabetical list of the supported encodings. As I saw here Hebrew must be part 8 of ISO 8859.

I've tried @George Donats answer, but couldn't make it work. So I wanted to suggest another possibility for future reference.

I couldn't find the file online, so I've recreated a txt file like your using TAB as a seperator. You can load it into R with the Hebrew text using a connection. It is demonstrated below:

con<-file("aa.txt",open="r",encoding="iso8859-8") ##Open a read-only connection with encoding fit for Hebrew (iso8859-8)

Than you can load it into R with your code, using con variable as the file input, code described here:

data<-read.table(con,sep="\t",header=TRUE)

Browsing into the data variable gives the following results:

str(data)

'data.frame':   3 obs. of  3 variables:
 $ אחת  : int  6 44 3
 $ שתיים: int  97 354 1
 $ שלוש : int  12 123 6

> data$אחת
[1]  6 44  3

继续阅读：hebrew r utf-8

How to read.table with "Hebrew" column names (in R)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？