super-space-optimized code
There are key self-contained algorithms - particularly cryptography-related such as AES, RSA, SHA1 etc - which you can find many implementations of for free on the internet.
Some are written to be nice and portable clean C.
Some are written to be fast - often with macros, and explicit unrolling.
As far as I can tell, none are trying to be especially super-small - so I'm resigned to writing my own - explicitly AES128 decryption and SHA1 for ARM THUMB2. (I'开发者_开发知识库ve verified by compiling all I can find for my target machine with GCC with -Os and -mthumb and such)
What patterns and tricks can I use to do so?
Are there compilers/tools that can roll-up code?
before optimizing for space (or speed): compilers are pretty clever these days, have you tried if a normal, readable implementation of aes128 gets small enough for your needs if you tell the compiler to optimize for space?
to go and write your own version of aes128 is perhaps a good educational thing but you will fight for bugs for sure and cryptography is not that kind of trivial stuff that falls out of thin air. and faulty or weak (due some bugs of your implementation) is pretty much the worse case you can have.
since you are targetting ARM and gcc is pretty common as a compiler for that platform:
-Os Optimize for size.
-Os enables all -O2 optimizations that do not typically
increase code size. It also performs further optimizations
designed to reduce code size.
It depends on what kind of space you are trying to optimise: code or data. There are essentially three variants of AES128 commonly in use, each differing in the amount of precomputed lookup table space.
- The fastest version uses 4k arranged as four 32-bit x 256 entry lookup tables (commonly called T-tables). If you can afford that amount of data space then the only instructions in this version are the EORs to combine the table results, these will roll up into a very small piece of code.
- The intermediate version uses a 8-bit x 256 entry lookup table to encode the SBox. The residual instructions need to implement the shift rows and mix columns steps so the code size is larger.
- The smallest (data-size) version doesn't use any lookup tables at all, but needs to compute all of the individual AES-field operations including the inversion. This will use the most instructions, even if you fold both the field-multiply and inversion into subroutines.
精彩评论