What is an FFP machine?
In R. Kent Dybvig's paper "Three Implementation Models for Scheme" he speaks of "FFP languages" and "FFP machin开发者_StackOverflow社区es". Apparently there is some correlation between FFP machines, and string-reduction on multiple processors.
Googling doesn't really uncover much in terms of explanations or examples.
Can anyone shed some light on this topic? Thanks.
Kent Dybvig's advisor, Gyula A. Mago, published a detailed description in "The FFP Machine: Technical Report 87-014" in 1987 by Mago and Stanat.
As of this writing, the PDF is freely available at: http://www.cs.unc.edu/techreports/87-014.pdf
The FFP Machine is a very fine-grained parallel computer architecture: each processor holds a single symbol / atom / value. It uses a string reduction model of computation in which innermost function applications are found and replaced by their equivalent result (eager evaluation). Where a result is used in several places, it tends to be re-evaluated instead of incurring the costs of accessing some global store (but see Mago's paper on "Copying Operands vs Copying Results", or better yet Mago's "Data Sharing in an FFP Machine" in the 1982 Functional Programming Languages and Computer Architecture conference).
The L cells holding the FFP expression being reduced communicate through a tree structured arrangement of T cells. Note that IC's are basically two dimensional and with wiring, circuits can move towards being three dimensional in physical space. Interconnection networks that occupy higher dimensions (such as the Hypercube, Omega, Banyan, Star, etc. networks) will eventually be unable to perform near their theoretical limit.
This communication network is circuit-switched rather than being packet-switched. Data packets contain no addresses and do not need routing. Packets from distinct reductions cannot meet, cannot conflict and cannot experience congestion with each other. The configuring activity (called "Partitioning") is performed in a single sweep upwards in the tree, using a handful of logic operations on 3-bit messages, leaving "area machines" in its wake, each created to advance at most a single reducible application. While it is technically logarithmic in time, the resulting area machines can begin communicating in a pipelined fashion behind the partitioning wave, practically costing a constant time penalty. (The dismantling of area machines remains a logarithmic cost in time).
Packets within a single reduction should, and must, meet and thus provide a often-useful synchronization. Sequences of packets are sorted and combined as they rise within an area, to be broadcast from the root of the area machine. Parallel Prefix and Parallel Suffix operations are provided to reduce area traffic, since there remains a potential bottleneck within an individual reducible application. This is accomplished without the need exhibited in the Ultracomputer (Jack (Jacob?) Schwartz at NYU) for a separate logarithmic-sized cache memory in each communication node. Each T cell (internal tree node) only needs a FIFO buffer (for efficiency) of size greater than the pipeline path to the top of the tree and back down. (This latter is a conjecture of mine, but it seems reasonable). Since the tree maintains the left-to-right order of data (unlike some other combining networks), the system enables cells to rotate their data in logarithmic rather than linear time, avoiding the plausible congestion at the root of the area machine. It's worth noting again, that the parallelism within an area machine is independent of the simultaneous parallelism in other area machines, and has available to it a number of processors proportional to the quantity of data in the operand.
Have you come across this yet?: Compiling APL for parallel execution on an FFP machine
Formal FP. Similar to FP, but with regular sugarless syntax, for machine execution is all I can offer you.
See Wikis Fp page.
精彩评论