Large integer radix/base conversion from 10^x to 2^x
Preface
I'm learning about computer math by writing and refining my own BigInt library. So far my first incarnation stores every digit of a base 10 number in successive elements of a vector. It can multiply and add with arbitrary precision. I want to speed it up by using all of the space available to me in a standard C++ data type by converting to base 2^x.
Information
I am reading 1000 or more digits from stdin in base 10 and I wish to convert them to base 2^x so I may store them easily in an array or vector of one of the standard C++ data types, likely unsigned int. I have only one idea for how to do base conversion, the repeated division with remainder method. Here is some C++ code describing that method:
vector<int> digits;
while(num!=0) {
int mod = num%base;
num = num/base;
digits.push_back(mod);
}
Conundrums
Some things I'm lost on are whether or not division with remainder is the proper way to do radix conversion on large integers. I've tried looking at how the GMP library does it. gmp/mpn/generic/set_str.c is the relevant c source file where "the magic" happens, but I am not certain what is going on in there. Matt McCutchen's BigInt appears to use the repeated division with remainder method. If I do use this method, I essentially need write two versions of my BigInt class, one for working in Base10 and the other for Base2^x.
Conclusion
- Offer advice on the proper steps for converting a huge number from a string to an array of 32 bit words.
- Help me learn how GMP converts a string to an array of 32bit words without wading through many layers of abstraction.
Examples Using 4 bit word size
Number we want to store (obviously on the small size): 123456789
unsigned chars have a range of 0-255, if we want to split up our number and store it in the vector we can do it one of three ways:
- As base 10, our vector looks like: [1,2,3,4,5,6,7,8,9]
- This is what my vector looked like in my first implementat开发者_JAVA技巧ion.
- As base 100, our vector looks like: [1,23,45,67,89]
- Easy to convert from base 10 to base 100, has ciel(digits in base10/2) elements.
- As base 256, our vector looks like: [7,91,205,21]
Obviously the third solution is the most optimal for internal representation, and is what I am trying to get to.
Once you have multiply and add functions working for your bigint library, converting a string to bigint is simplicity itself. Start with a conversion result of zero. For each digit you process (left to right), multiply the previous result by 10 and add the value of the new digit (using your bigint multiply and add functions).
In general, to convert one base to another (from most significant digit to least), the algorithm is as follows:
output = 0
foreach digit in digits:
output = output * base + digit
In the reverse order, it's the following:
output = 0
multiplier = 1
foreach digit in digits:
output = output + multiplier * digit
multiplier = multiplier * base
You can use your bigint library recursively using this math to figure out how to store the numbers. By that, I mean you need to implement BigInt*BigInt and BigInt+BigInt, so this is how you can convert bases. It's not the most efficient way, but it is a lot faster than division.
One thing not mentioned in FryGuy's post is that in that method the arithmetic has to be carried out in the base you are converting to, while the base that is being used as a multiplier is the same as the base of your original number; the base you no longer want or the base you are converting from.
There are two formulas for radix conversion on integers. (1) uses base B, which we are converting to (destination base), as a repeated divisor coupled with arithmetic done in base b, where b is the base converting from. (2) on the other hand uses base b as a multiplier on the digits we are converting, which happen to be in the same base b, and does arithmetic in base B.
(1) formula uses B as a parameter and does arithmetic in base b
(2) formula uses b as parameter and does arithmetic in base B
this is out of knuth, vol 2, 4.4 radix conversions
精彩评论