How to remove all non - ASCII characters from a string in Ruby

2023-01-06 14:09 问答作者：

I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g Â© etc. See the following example.

#coding: utf-8
s = " Hello this a mixed string Â© that I made."
puts s.encoding
puts s.encode

output:

UTF-8
Hello this a mixed str

ing ┬⌐ that I made.

When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT

So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encodi开发者_C百科ng the source string "s" uses.

I have been searching and experimenting for quite some time now.

If I try to use

  puts s.encode('ASCII-8BIT')

It gives the error:

 : "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)

You can just literally translate what you asked into a Regexp. You wrote:

I want to get rid of all non ASCII characters

We can rephrase that a little bit:

I want to substitue all characters which don't thave the ASCII property with nothing

And that's a statement that can be directly expressed in a Regexp:

s.gsub!(/\P{ASCII}/, '')

As an alternative, you could also use String#delete!:

s.delete!("^\u{0000}-\u{007F}")

Strip out the characters using regex. This example is in C# but the regex should be the same: How can you strip non-ASCII characters from a string? (in C#)

Translating it into ruby using gsub should not be difficult.

UTF-8 is a variable-length encoding. When a character occupies one byte, its value coincides with 7-bit ASCII. So why don't you just look for bytes with a '1' in the MSB, and then remove both them and their trailers? A byte beginning with '110' will be followed by one additional byte. A byte beginning with '1110' will be followed by two. And a byte beginning with '11110' will be followed by three, the maximum supported by UTF-8.

This is all just off the top of my head. I could be wrong.

继续阅读：ruby watir

How to remove all non - ASCII characters from a string in Ruby

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？