开发者

Problem with cyrillic symbols in console

sorry for bad English. It's Ruby code.

s = "мистика"

`touch #{s}`
`cat #{s}`
`cat < #{s}`

Can anybody tell why it's code fails? With

sh: cannot open ми�тика: No such file

But thic code works fine

s = "работает" 
`touch #{s}` 
`cat #{s}` 
`cat < #{s}` 

Problem is only when Russian symbol 'с' in the word and with symobol '<'

w开发者_如何转开发oto@woto-work:/tmp$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=

woto@woto-work:/tmp$ ruby -v 
ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux] 

woto@woto-work:/tmp$ uname -a 
Linux woto-work 2.6.32-26-generic #48-Ubuntu SMP Wed Nov 24 10:14:11 
UTC 2010 x86_64 GNU/Linux 

woto@woto-work:/tmp$ lsb_release -a 
No LSB modules are available. 
Distributor ID: Ubuntu 
Description:    Ubuntu 10.04.1 LTS 
Release:        10.04 
Codename:       lucid 

Another example

maybe this will be also useful to understand my problem

woto@woto-work:~/rails/avtorif$ touch мистика
woto@woto-work:~/rails/avtorif$ ruby -e "`cat < мистика`"
woto@woto-work:~/rails/avtorif$ ruby -e '`cat < мистика`'
sh: cannot open ми�тика: No such file


This is a bug in dash, shell which Debian uses by default (symlink /bin/sh leads to /bin/dash; and python's os.system uses sh. Ruby probably uses sh too). dash cannot properly parse 8-bit text, including UTF-8. To workaround your problem, replace it by bash:

sudo dpkg-reconfigure dash

and select "No". This way the system will use bash as /bin/sh shell, which can handle UTF-8.


The following works for me, have you tried it this way?

s="мистика"
touch $s

In bash you reference a variable prepending the dollar sign.


In each of your examples, you are executing a shell command. As a first step, I would make sure that your shell command executes as you expect it to when you type it in directly:

touch мистика
cat мистика
cat < мистика

If you are getting errors in the shell, it is one of two possibilities: the shell command doesn't understand the character encoding, or the filename needs quotes to distinguish it appropriately.

Ruby 1.9 understands character set encodings, something that Ruby 1.8 did not. You'll have to do a little research to determine what character encoding your shell environment uses. Once you do, you'll create the commands as regular strings:

touch = "touch #{s}".force_encoding("UTF-8") ## or whatever encoding you need

and then execute the command:

`#{touch}`

I believe Ruby 1.9's default encoding is UTF-8. Ruby 1.8 has no concept of encoding and a string is merely an array of bytes. Unfortunately, not every piece of software understands unicode or the concepts of character encoding (much like Ruby 1.8). In those cases the system will use whatever the default encoding is. I suspect your shell environment may be one of those programs.


use ruby 1.9 it has force_encoding methods in String object

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜