开发者

Ruby unfamiliar string usage with Integer.chr and "\001"

Recently I stumbled over this code snippet in Ruby:

@data = 3.chr * 5

which results in "\003\003\003\003\003"

later in the code for example

flag = @data[2] & 2

is used, I know that it has something todo with bitwise-flags. It seems the values 1,2 and 3 are used as state flags, but because ruby 1.9, which开发者_如何学JAVA is the version I am familar with, changed the Integer.chr method the code does no longer work and I would really like to know whats going on. Furthermore, what is the purpose of the "\00x" escaped-thing?

Thanks for your answers


To make the code work in Ruby 1.9, try changing that line to:

flag = @data[2].ord & 2

Prior to Ruby 1.9, str[n] would return an integer between 0 and 255, but in Ruby 1.9 with its new unicode support, str[n] returns a character (string of length 1). To get the integer instead of character, you can call .ord on the character.

The & operator is just the standard bitwise AND operator common to C, Ruby, and many other languages.

Byte number three (0x03) is not a printable ASCII character, so when you have that byte in a string and call inspect ruby denotes that byte as \003. Just make sure you understand that "\003" is a single-byte string while '\003' is a four-byte string.

In Ruby, strings are really sequences of bytes. In Ruby 1.9, there is also encoding information, but they are still really just a sequence of bytes.


The "\00X" thing is an octal representation of the value.

So if we do:

irb(main):001:0> 15.chr
=> "\017"
irb(main):002:0> 16.chr
=> "\020"

Notice how we went from 17 right to 20? Octal.

"\003\003\003\003\003" is 5 bytes of the value 3 and you can then bitwise and them with other bytes, such as 2 or \002.

So 3 or 0011 in binary anded with 2 (0010) is 2 (0010)

The 1.9 issue occurs on account of 1.9 not using ascii like 1.8 does. David Grayson hits that point well.


Note that ruby 1.9 will inspect unprintable characters in the hexadecimal representation:

3.chr  # => "\x03"

Even more confusing is that sometimes the strings will appear in unicode (UTF-8):

"\003" # => "\u0003"  (utf-8)
3.chr.encoding  # => #<Encoding:US-ASCII>
"\003".encoding  # => #<Encoding:UTF-8>
"\003" == 3.chr  # => true (this is strange because the encoding is different)

If you're trying to understand how these octal and hex strings relate to decimal numbers, you can convert them to binary:

"\003".unpack('B*')  # same as "\003".ord.to_s(2)
# => ["00000011"]  # the 2 least significant bits are set
2.to_s(2)  # convert to base 2
#=> "10"

The expression 3 & 2 is a bitwise-and of binary numbers 11b and 10b, which will yield 10b (because 1 & 1 is 1 for the most significant bit; 1 & 0 is 0 for least significant).

Other conversions:

'%x' % 97  # => '61' hex
0x61  # => 97 decimal from raw hex input
'%o' % 97  # => '141' octal
0141  # => 97 decimal from raw octal input

This is sort of a crash course but you should probably google for more in-depth info.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜