Reading ASCII-encoded files with Ruby 1.9 in a UTF-8 environment

2023-01-15 08:39 问答作者：

I just upgraded from Ruby 1.8 to 1.9, and most of my text processing scripts now fail with the error invalid byte sequence in UTF-8. I need to either strip out 开发者_StackOverflow社区the invalid characters or specify that Ruby should use ASCII encoding instead (or whatever encoding the C stdio functions write, which is how the files were produced) -- how would I go about doing either of those things?

Preferably the latter, because (as near as I can tell) there's nothing wrong with the files on disk -- if there are weird, invalid characters they don't appear in my editor...

What's your locale set to in the shell? In Linux-based systems you can check this by running the locale command and change it by e.g.

$ export LANG=en_US

My guess is that you are using locale settings which have UTF-8 encoding and this is causing Ruby to assume that the text files were created according to utf-8 encoding rules. You can see this by trying

$ LANG=en_GB ruby -e 'warn "foo".encoding.name'
US-ASCII
$ LANG=en_GB.UTF-8 ruby -e 'warn "foo".encoding.name'
UTF-8

For a more general treatment of how string encoding has changed in Ruby 1.9 I thoroughly recommend http://blog.grayproductions.net/articles/ruby_19s_string

(code examples assume bash or similar shell - C-shell derivatives are different)

继续阅读：ascii encoding ruby ruby-1.9 utf-8

Reading ASCII-encoded files with Ruby 1.9 in a UTF-8 environment

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？