Funky haskell lazy list implicit recursion
In Haskell, you can build infinite lists due to laziness:
Prelude> let g = 4 : g
Prelude> g !! 0
4
Prelude> take 10 g
[4,4,4,4,4,4,4,4,4,4]
Now, what exactly goes on when I try to construct a list like this?
Prelude> let f = f !! 10 : f
Prelude> f !! 0
Interrupted.
Prelude> take 10 f
[Interrupted.
Prelude>
The Interrupted.
s are me hitting CTRL+C after waiting a few seconds. It seems to go into an infinite loop, but why is that the case?
Explanation for non-Haskellers:
The :
operator is prepend
:
Prelude> 4 : [1, 2, 3]
[4,1,2,3]
This line:
Prelude> let g = 4 : g
says "let g
be the list constructed by prepending 4
into the list g
". When you ask for the first element, 4 is returned, as it's already there. When you ask for the second element, it looks for the element after 4. That eleme开发者_如何学Pythonnt would be the first element of the list g
, which we just calculated (4), so 4
is returned. The next element is the second element of g
, which we again just calculated, etc...
!!
is just indexing into a list, so this means get the element at index 0
from g
:
Prelude> g !! 0
4
But when I do this:
Prelude> let f = f !! 10 : f
something breaks, because to compute the first element of f
you need the 11th element, which doesn't exist yet? I would expect an exception, though, not an infinite loop...
In this case, a picture may tell a thousand words.
First, remember how cons (the (:)
list constructor) works. It's a pair of two things: an element, and a reference to a list tail (which is either another cons, or []
).
As you should know, when you say [1, 2, 3]
, it's just a shortcut for (1:(2:(3:[])))
or 1:2:3:[]
. If you visualize each cons pair as a box with two slots, this expression look like:
┌───┬──┐ ┌───┬──┐ ┌───┬──┐ ┌────┐
│ 1 │ ─┼─>│ 2 │ ─┼─>│ 3 │ ─┼─>│ [] │
└───┴──┘ └───┴──┘ └───┴──┘ └────┘
One loop
When you say g = 4 : g
, you're not really building an "infinite" list, you're building a circular list: g
is defined as a cons whose tail reference simply points back to g
itself:
┌──────────┐
│ ┌───┬──┐ │
└>│ 4 │ ─┼─┘
└───┴──┘
This actually has nothing to do with laziness, and everything with self-reference: for example, you can do the same thing in (eager) Common Lisp using syntax like '#1=(4 . #1#)
(where the #1
is like g
).
Whether you say g !! 0
, or g !! 1000000000000
, g
never grows: (!!)
simply runs around the loop in-place, for as many times as you told it to, until it exhausts itself and returns the element, 4
.
Two loops
When you say f = (f !! 10) : f
, the same thing happens—except now, the element slot contains a different expression than 4
:
┌──────────┐
│ ┌───┬──┐ │
└>│ ╷ │ ─┼─┘
└─┼─┴──┘
│
│ ┌───────────┐
└>│ (f !! 10) │
└───────────┘
Crucially, this sub-expression also refers back to f
, just like the tail does:
┌──────────┐
│ ┌───┬──┐ │
┌┴>│ ╷ │ ─┼─┘
│ └─┼─┴──┘
│ │
│ │ ┌───────────┐
│ └>│ (f !! 10) │
│ └──┼────────┘
└─────────┘
So, when you ask for f !! n
, (!!)
will first run around the top loop n
times, then return the element, as it did for g
. However, instead of escaping the loop, (f !! 10)
just re-enters it, and the process repeats itself: looping 10 times around the top, then once around the bottom, and back.
"Doesn't exist yet" is not quite right. We try not to think about when values are exist — denotational programming is about eternal unchanging values and equations.
More concretely, this code is fine:
Prelude> let x = [(x !! 1) + 1, 3] in x
[4,3]
You might expect that x !! 1
doesn't exist yet. But here's how an implementation like GHC works.
When building the list f
, it constructs an in-memory object (a "thunk") to represent the expression (x !! 1) + 1
. No evaluation of x
has occurred yet. It wraps a pointer to this thunk, plus a pointer to 3
, into a linked list and hands it off to GHCi's implicit show
.
Now, the show
instance for lists has to show the elements one by one. show
will demand ("force") the evaluation of the thunk for (x !! 1) + 1
, which causes that code to be "entered". By the definition of (+)
, forcing (x !! 1) + 1
forces x !! 1
, which in turn forces two things:
- the spine of the list (its cons cells) through the second cell, and
- the value of the second element (the "car" of the second cell, in Lisp terms), because
(+)
wants to operate on that value.
And the second value is present by now — it's 3
. If it were another thunk we'd force that, and so on. See this blog post for an interesting view on self-referential containers.
Now, how does the compiled GHC code detect an infinite loop in your other example? When we enter a thunk, we need to remember to come back later and overwrite it with the final value. This is what's meant specifically by "lazy evaluation" as opposed to "non-strict semantics", and it prevents us from duplicating work.
Anyway, as an optimization when entering a thunk, GHC's runtime will first overwrite it with another object called a "blackhole". We'll come back later and overwrite the blackhole with the final value. But what happens if we enter a blackhole before doing so? That means that evaluating x
requires first evaluating x
, an unresolvable loop. So entering a blackhole throws an exception.
You are correct as to why it hangs--you've created a circular dependency that it can't resolve. Computing the current element requires a later element that can't be computed until the current one is computed, blah blah, around in circles we go.
As for why it doesn't produce an exception, try compiling it instead of running it in GHCi:
$ ghc --make Loop.hs
$ ./Loop.exe
Loop.exe: <<loop>>
I'm assuming this is a NonTermination
exception. Pff, Halting Problem? Ha.
Not everything works the way you'd like or expect it to when done in GHCi. If something seems odd, try compiling a small example and see if it makes more sense that way. GHCi using different rules for type defaulting trips me up sometimes, for instance.
精彩评论