开发者

curious about how "loop = loop" is evaluated in Haskell

I thought expressions like this would cause Haskell to evaluate forever. But the behaviors in both GHCi and the compiled program surprised me.

For example, in GHCi, these expressions blocked until I Control+C, but consumed no CPU. Looked like it was sleeping.

let loop = loop
let loop = 1 + loop

I tried compiling these programs with GHC:

main = print loop
  where loop = 1 + loop

main = print loop
  where loop = if True then loop else 1

What was printed was:

Main: <<loop>>

So my question is: Obviously these expressions are compiled to something different than loops or recursive calls in imperative languages. What are they compiled to? Is this a special rule to handle 0-arg functions that have themselves in the right hand side, or it's a special case of something more general that I don't know?

[EDIT]:

One more question: If this happens to be a special handling from the compiler, what is the reason behind doing this 开发者_如何学Gowhen it's impossible to check for all infinite loops? 'Familiar' languages don't care about cases like while (true); or int f() { return f();}, right?

Many thanks.


GHC implements Haskell as a graph reduction machine. Imagine your program as a graph with each value as a node, and lines from it to each value that value depends on. Except, we're lazy, so you really start with just one node -- and to evaluate that node, GHC has to "enter" it and open it up to a function with arguments. It then replaces the function call with the body of the function, and attempts to reduce it enough to get it into head normal form, etc.

The above being very handwavy and I'm sure eliding some necessary detail in the interest of brevity.

In any case, when GHC enters a value, it generally replaces it with a black hole while the node is being evaluated (or, depending on your terminology, while the closure is being reduced) This has a number of purposes. First, it plugs a potential space leak. If the node references a value which is used nowhere else, the black hole allows that value to be garbage-collected even while the node is being evaluated. Second, this prevents certain types of duplicate work, since in a multi-threaded environment, two threads may attempt to enter the same value. The black-hole will cause the second thread to block rather than evaluate the value already being evaluated. Finally, this happens to allow for a limited form of loop detection, since if a thread attempts to re-enter its own black hole, we can throw an exception.

Here's a bit of a more metaphorical explanation. If I have a series of instructions that moves a turtle (in logo) around the screen, there's no one way to tell what shape they will produce, or whether that shape terminates without running them. But if, while running them, I notice that the path of the turtle has crossed itself, I can indicate to the user "aha! the turtle has crossed its path!" So I know that the turtle has reached a spot it has been before -- if the path is a circuit through evaluating the nodes of a graph, then that tells us we're in a loop. However, the turtle can also go in, for example, an expanding spiral. And it will never terminate, but it will also never cross its prior path.

So, because of the use of black holes, for multiple reasons, we have some notion of a marked "path" that evaluation has followed. And if the path crosses itself, we can tell and throw an exception. However, there are a million ways for things to diverge that don't involve the path crossing itself. And in those cases, we can't tell, and don't throw an exception.

For super-geeky technical detail about the current implementation of black holes, see Simon Marlow's talk from the recent Haskell Implementors Workshop, "Scheduling Lazy Evaluation on Multicore" at the bottom of http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010.


In some, limited cases, the compiler can determine such a loop exists as part of its other control flow analyses, and at that point replaces the looping term with code that throws an appropriate exception. This cannot be done in all cases, of course, but only in some of the more obvious cases, where it falls out naturally from other work the compiler is doing.

As for why Haskell finds this more often than other languages:

  • These cases do not occur in languages which are strict such as C. These loops specifically happen when a lazy variable's computation depends on its own value.
  • Languages such as C have very specific semantics in loops; ie, what order to do what in. As such, they are forced to actually execute the loop. Haskell, however defines a special value _|_ ("the bottom"), which is used to represent erroneous values. Values which are strict on themselves - ie, they depend on their own value to compute - are _|_. The result of pattern-matching on _|_ can either be an infinite loop or an exception; your compiler is choosing the latter here.
  • The Haskell compiler is very interested in performing strictness analysis - ie, proving that a certain expression depends on certain other expressions - in order to perform certain optimizations. This loop analysis falls out naturally as an edge case in the strictness analyzer which must be handled in one way or another.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜