How do I encode a 4-byte string as a single 32-bit integer?
First, a disclaimer. I'm not a CS grad nor a math major, so simplicity is important.
I have a four-character string (e.g. "isoy") that I need to pass as a single 32-bit integer field. Of course at the other end, I need to decode it back to a string. The string will only contain A-Z, and case is not important, if that helps.
The funny part is that I'm starting with PowerShell on the sending end and Linux at the receiving end. I can use Perl or Python there, with a preference for Python. I don't actually need answers in each language, 开发者_开发技巧I'm most interested in a PowerShell (C# also good) example for going both ways.
To 32-bit unsigned integer:
uint x = BitConverter.ToUInt32(Encoding.ASCII.GetBytes("isoy"), 0); // 2037347177
To string:
string s = Encoding.ASCII.GetString(BitConverter.GetBytes(x)); // "isoy"
BitConverter uses the native endianness of the machine.
For Python, struct.unpack does the job (to make a 4-byte string into an int -- struct.pack
goes the other way):
>>> import struct
>>> struct.unpack('i', 'isoy')[0]
2037347177
>>> struct.pack('i', 2037347177)
'isoy'
>>>
(you can use different formats to ensure big-endian or little-endian encoding, if you need that -- '>i'
and '<i'
respectively -- instead of just plain 'i'
which uses whatever encoding is native to the machine).
// string -> int
uint ret = 0;
for ( int i = 0; i < 4; ++i )
{
ret |= ( str[i] << ( i * 8 ) );
}
// int -> string
for ( int i = 0; i < 4; ++i )
{
str[i] = ( ret >> ( i * 8 ) ) & 0xff;
}
Using PowerShell syntax you can do it this way (pretty much like dtb solution):
PS> $x = [BitConverter]::ToUInt32([byte[]][char[]]'isoy', 0)
PS> [char[]][BitConverter]::GetBytes($x) -join ''
isoy
You do have to watch out for endian-ness on the Linux side. If it is running on an Intel processor I believe should be fine (same endian-ness as the PowerShell side).
Please take a look at the struct standard library module in Python's Manual. It has two functions for this: struct.pack
and struct.unpack
. You can use the 'L' (unsigned long) format character for this.
Aside from byte packing, you can also consider that your 26-character alphabet can be encoded as 0-25 instead of A-Z.
So without worrying about big and little endians, you can go from "letters" to a number like this:
val=letter0+letter1*26+letter2*26*26+letter3*26*26*26;
to go from val back to letters, you do something like this:
letter0=val%26;
letter1=(val/26)%26;
letter2=(val/(26*26))%26;
letter3=(val/(26*26*26))%26;
where "%" is your language's modulus operator and "/" is an integer division.
You'll obviously need a way to get from 'A'-'Z' to 0-25 and back. That's language dependent.
You can easily put this into loops. I show the loops unrolled to make things a bit more obvious.
It's more common to pack letters into bytes, so you can use shift and and bitwise operations to encode and decode. But by doing it the way I show above, you could pack six letters into a 32-bit number, rather than just four. Which is nice, since you can hold things like stock market ticker symbols in a single 32-bit value (mutual funds ticker symbols are 5 characters).
精彩评论