开发者

How do I encode a 4-byte string as a single 32-bit integer?

First, a disclaimer. I'm not a CS grad nor a math major, so simplicity is important.

I have a four-character string (e.g. "isoy") that I need to pass as a single 32-bit integer field. Of course at the other end, I need to decode it back to a string. The string will only contain A-Z, and case is not important, if that helps.

The funny part is that I'm starting with PowerShell on the sending end and Linux at the receiving end. I can use Perl or Python there, with a preference for Python. I don't actually need answers in each language, 开发者_开发技巧I'm most interested in a PowerShell (C# also good) example for going both ways.


To 32-bit unsigned integer:

uint x = BitConverter.ToUInt32(Encoding.ASCII.GetBytes("isoy"), 0); // 2037347177

To string:

string s = Encoding.ASCII.GetString(BitConverter.GetBytes(x));      // "isoy"

BitConverter uses the native endianness of the machine.


For Python, struct.unpack does the job (to make a 4-byte string into an int -- struct.pack goes the other way):

>>> import struct
>>> struct.unpack('i', 'isoy')[0]
2037347177
>>> struct.pack('i', 2037347177)
'isoy'
>>> 

(you can use different formats to ensure big-endian or little-endian encoding, if you need that -- '>i' and '<i' respectively -- instead of just plain 'i' which uses whatever encoding is native to the machine).


// string -> int    

uint ret = 0;
for ( int i = 0; i < 4; ++i )
{
  ret |= ( str[i] << ( i * 8 ) );
}

// int -> string
for ( int i = 0; i < 4; ++i )
{
  str[i] = ( ret >> ( i * 8 ) ) & 0xff;
}


Using PowerShell syntax you can do it this way (pretty much like dtb solution):

PS> $x = [BitConverter]::ToUInt32([byte[]][char[]]'isoy', 0)
PS> [char[]][BitConverter]::GetBytes($x) -join ''
isoy

You do have to watch out for endian-ness on the Linux side. If it is running on an Intel processor I believe should be fine (same endian-ness as the PowerShell side).


Please take a look at the struct standard library module in Python's Manual. It has two functions for this: struct.pack and struct.unpack. You can use the 'L' (unsigned long) format character for this.


Aside from byte packing, you can also consider that your 26-character alphabet can be encoded as 0-25 instead of A-Z.

So without worrying about big and little endians, you can go from "letters" to a number like this:

val=letter0+letter1*26+letter2*26*26+letter3*26*26*26;

to go from val back to letters, you do something like this:

letter0=val%26;
letter1=(val/26)%26;
letter2=(val/(26*26))%26;
letter3=(val/(26*26*26))%26;

where "%" is your language's modulus operator and "/" is an integer division.

You'll obviously need a way to get from 'A'-'Z' to 0-25 and back. That's language dependent.

You can easily put this into loops. I show the loops unrolled to make things a bit more obvious.

It's more common to pack letters into bytes, so you can use shift and and bitwise operations to encode and decode. But by doing it the way I show above, you could pack six letters into a 32-bit number, rather than just four. Which is nice, since you can hold things like stock market ticker symbols in a single 32-bit value (mutual funds ticker symbols are 5 characters).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜