What is an efficient way to convert a bignum type structure to a human readable string?
I've got a bit of a problem. In order to grow in my knowledge of C, I've decided to try to implement a basic bigint library.
The core of the bigint structure will be an array of 32 bit integers, chosen because they will fit in a 开发者_运维问答register. This will allow me to do operations between digits that will overflow in a 64 bit integer (which will also fit in a register, as I'm on x86-64), and I can bitshift out each part of the result. I've implemented basic addition, and to test that it is working, I have to print the array. For my own testing purposes, it's fine if I use printf()
and output each digit in hex. I can read that just fine.
However, most people can't read hex. As the number is stored in (essentially) base 2^32, printing is a bit of a problem. What would be a good way to convert to base 10?
EDIT:
This deals not with knowing how to convert from base to base, but about a good way to implement this. I was thinking along the lines of making another bigint with another base with conversion for printing.
First of all, you can't do I/O in a sensible way without the basic operations(e.g. division and modulus). To provide efficient implementation of converting the bigint to base-10 string, I am researching two possible optimizations:
First, you can divide by some power of ten instead of ten exactly. What that means, you will get four base-10 digits every time you divide the number by 10000 for example.
Second, how would you choose which power of ten to divide by? 10, 100, 1000, 10000, etc...
There seems to be a good choice which is the maximum power of ten that can fit in your word(32-bit). Fortunately, you can implement division/modulus by one word much more efficiently than when it comes to two "bigint"s.
I haven't given an implementation because I am still researching the problem in my spare time because I have implemented the basic operations in my library and I/O is the next step hopefully ;)
Dividing by the largest power of 10 that will fit in your base type is the best way to start. In your case, this would be dividing by 10^9. This code should be general purpose since you will be able to reuse it for part of your general division/modulo code.
The running time will be O(n^2) (i.e. if your number is twice as big, the conversion will talk four times longer) but it should be fast enough for moderate sized numbers.
For very large values, you will want to cache large powers of 10, say 10^1000, 10^2000, 10^4000, 10^8000, ...., and then divide by the power of 10 that is greater than or equal to 1/2 of the number you are trying to convert. Repeat this process until the numbers are small enough to convert quickly using division by 10^9. Depending on how efficient your division algorithm is, this approach may not be faster until you encounter numbers in excess of a million digits or more.
If you are writing an interactive calculator where every number will be displayed, then using base 10^9 will be faster for display (it will be O(n), i.e. if your number is twice as big, the conversion will only take twice as long).
The normal way of repeatedly dividing by 10 is obviously going to be painfully slow.
An obvious quick way is to have precomputed arrays of bigints corresponding to the value of each digit in each position. You can then do binary search and relatively cheap comparisons/subtractions to find the ms digit and then each digit in turn.
You could revert to division by 10 when you get down to the last 32 (or 64) bits.
The most efficient algorithm I can think of is the following. It should have a runtime complexity in O(n·(log n)²·log log n), as opposed to the naive algorithm which has quadratic runtime complexity.
- Assume without loss of generality that the number A is 2n+1 bits long. It may have leading zeros.
- Compute the decimal representations of the numbers 22i for i up to n by repeated squaring, if this is the topmost recursion level.
- Split the bit sequence of the input number in two parts B and C. The part with the less significant bits, C, comprises the 2n least significant bits of A, and the part B is the remaining more significant bits.
- Convert B and C to their decimal representations, either using the quadratic runtime algorithm if they are short enough, or by recursively calling this algorithm.
- Multiply the decimal representation of B by the cached decimal representation of 22n and add the decimal representation of C to get the decimal representation of A.
In steps 2 and 5 you need a decimal multiplication algorithm. For numbers with tens of thousands of digits, you should use a version of the Schönhage-Strassen algorithm that works in base 10. This will lead to the runtime complexity stated above. For shorter numbers, depending on their length, the Toom-Cook algorithm, Karatsuba algorithm or long multiplication should be used. However, I cannot currently tell how to implement the Schönhage-Strassen algorithm in base 10, as all complete descriptions of it I could find are for base 2 and I do not know enough number theory to derive it myself.
精彩评论