Persistent Code Memoization in Compilers and Run-Times Environments
I believe the concept of a code-cache (for example ccache) should be extended into a more fine-grained memoization of both intermediate code (IC) and target code (TC) in compilers such as GCC or LLVM+Clang.
This can then be used for a whole range of ground-breaking cleverness benefiting both programmer productivity and compile-, run-time performance and run-time memory usage.
More specifically, this repository (or database) should automatically cache IC and TC of functions. These can then be looked up and reused in different sets of builds (compiled only once link many) in across sets of programs and libraries and not just across object boundaries during linking (LTO).
This would especially benefit C++ STL container-algorithm-instantiations. For example how many times hasn't algorithms such std::sor开发者_运维问答t
applied on std::vector<T>
been instantiated and optimized and compiled in different programs using the same type T
typically int
, float
and double
?
In an implementation, IC-modules should be indexed by keys constructed from hash-chain (SHA-1 should suffice) of compiler configuration and IC-code-tree (including the sub-tree-code-hashes of the functions it calls) and stored in for example an std::unordered_map
providing very cheap lookups. To even further promote reuse of code the IC-repository could be put online as network-service.
Of course the memoizations should only be cached when needed for optimal good performance. This should have a very small overhead. As most hash-keys lookups should be misses the keys should be placed in memory but not necessarily the code-snippets.
This project has already proved the usefulness of this idea applied to the Python language. I believe Haskell (GHC) may be the ideal language for experimenting with these ideas because of its default function purity and flexible control on function side-effects.
精彩评论