Having trouble finishing off this enumeratee
At one point I wrote a packet capture program in haskell and it used lazy IO to catch all the tcp packets. The problem was that sometimes packets are out of order, so I had to insert all of them into a list until I got a fin flag to be sure that I had all the packets necessary to do anything with them, and if I was sniffing something really big, like a video, I had to hold all that in memory. To do it any other way would require some difficult imperative code.
So later I learned about iteratees, and I decided to implement my own. How it would work is, there is an enumeratee. You supply it with the number of packets you want it to hold. As it pulls in packets, it sorts them, and then once it gets up to the number you specify, it starts flushing, but leaves a few in there so that new chunks are sorted into that list before more packets are flushed. The idea is that chunks will be almost in order before they hit this enumeratee, and it will fix most small order problems. When it gets an EOF, it should send all remaining packets back out.
So it almost works. I realize some of these could be replaced by standard enumerator functions, but I wanted to write them myself to understand how it works better. Here's some code:
Readlines just gets lines from a file one line at a time and feeds it. PrintLines just prints each chunk. numbers.txt is a line delimited set of numbers that are slightly out of order, some numbers are several spaces before or after they should be. Reorder is the function that holds n numbers and sorts new ones into its accumulator list, and then shoves out all but the last n of those numbers.
import Prelude as P
i开发者_如何学运维mport Data.Enumerator as E
import Data.Enumerator.List as EL
import Data.List (sort, insert)
import IO
import Control.Monad.Trans (lift)
import Control.Monad (liftM)
import Control.Exception as Exc
import Debug.Trace
test = run_ (readLines "numbers.txt" $$ EL.map (read ::String -> Int) =$ reorder 10 =$ printLines)
reorder :: (Show a, Ord a) => (Monad m) => Int -> Enumeratee a a m b
reorder n step = reorder' [] n step
where
reorder' acc n (Continue k) =
let
len = P.length
loop buf n' (Chunks xs)
| (n' - len xs >= 0) = continue (loop (foldr insert buf xs) (n' - len xs))
| otherwise =
let allchunx = foldr insert buf xs
(excess,store)= P.splitAt (negate (n' - len xs)) allchunx
in k (Chunks excess) >>== reorder' store 0
loop buf n' (EOF) = k (Chunks (trace ("buf:" ++ show buf) buf)) >>== undefined
in continue (loop acc n)
printLines :: (Show a) => Iteratee a IO ()
printLines = continue loop
where
loop (Chunks []) = printLines
loop (Chunks (x:xs)) = do
lift $ print x
printLines
loop (EOF) = yield () EOF
readLines :: FilePath -> Enumerator String IO ()
readLines filename s = do
h <- tryIO $ openFile filename ReadMode
Iteratee (Exc.finally (runIteratee $ checkContinue0 (blah h) s) (hClose h))
where
blah h loop k = do
x <- lift $ myGetLine h
case x of
Nothing -> continue k
Just line -> k (Chunks [line]) >>== loop
myGetLine h = Exc.catch (liftM Just (hGetLine h)) checkError
checkError :: IOException -> IO (Maybe String)
checkError e = return Nothing
My problem is at the undefined in reorder. What happens is reorder has 10 items stuck in it, and then it receives an EOF from up the stack. So it goes k (Chunks those10items) and then there is an undefined because I don't know what to put here to make it work.
What happens is that the last 10 items get chopped out of the output of the program. You can see the trace, that variable buf has all the remaining items in it. I have tried yielding, but I'm not sure what to yield or if I should yield at all. I'm not sure what to put there to make this work.
Edit: Turns out the reorder was fixed by changing the undefined part of the loop to:
loop buf n' EOF = k (Chunks buf) >>== (\s -> yield s EOF)
which I almost definitely had at one point, but I didn't get the right answer so I assumed it was wrong.
The problem was with printLines. Since reorder was sending out chunks one at a time until it got to the very end, I never noticed the problem with printLines which was that it was discarding chunks other than the first one per loop. In my head I thought that the chunks would carry over or something, which was stupid.
Anyways I changed printLines to this:
printLines :: (Show a) => Iteratee a IO ()
printLines = continue loop
where
loop (Chunks []) = printLines
loop (Chunks xs) = do
lift $ mapM_ print xs
printLines
loop (EOF) = yield () EOF
And now it works. Thanks a lot, I was afraid I wouldn't get an answer.
How about
loop buf n' (EOF) = k (Chunks buf) >>== (\s -> yield s EOF)
(idea taken from EB.isolate).
Depending on what exactly you're trying to do, your printLines may also need fixing; the case for Chunks (x:xs) throws away xs. Something like
loop (Chunks (x:xs)) = do
lift $ print x
loop (Chunks xs)
may (or may not) have been what you intended.
精彩评论