Reading in arbitrary amount of binary messages
I am parsing binary data out of files using Binary.Get and have something like the following:
data FileMessageHeaders = FileMessa开发者_运维问答geHeaders [FileMessageHeader]
data FileMessageHeader = FileMessageHeader ...
instance Binary FileMessageHeaders where
put = undefined
get = do
messages <- untilM get isEmpty
return (FileMessageHeaders messages)
instance Binary FileMessageHeader where
put = undefined
get = ..
The problem I am having is that the untilM from monad-loops on hackage uses sequence so I believe that this is what is causing a massive delay in returning the head of the FileMessageHeader list as the whole file must be read (is this correct?). I am having trouble coming up with a way to rewrite this and avoid sequencing all of the FileMessageHeaders in the file. Any suggestions?
Thanks!
As FUZxxl notes, the problem is untilM
; the Get
monad is strict and requires that the entire untilM
action completes before it returns. IO has nothing to do with it.
The easiest thing to do is probably switch to attoparsec and use that for parsing instead of binary. Attoparsec supports streaming parses and would likely be much easier to use for this case.
If you can't switch to attoparsec, you'll need to use some of the lower-level functions of binary rather than just using the Binary
instance. Something like the following (completely untested).
getHeaders :: ByteString -> [FileMessageHeader]
getHeaders b = go b 0
where
go bs n
| B.null bs = []
| otherwise = let (header, bs', n') = runGetState get bs n
in header : go bs' n'
Unfortunately this means you won't be able to use the Binary
instance or the get
function, you'll have to use getHeaders
. It will stream though.
The problem here is, that an IO
action has to finish before the control flow can continue. Thus, the program has to read in all the messages, before they get evaluated. You could try to define an own combinator sequenceI
, that uses the function unsafeInterleaveIO
from System.IO.Unsafe
. This function allows you, well, to interleave actions. It is used, for instance by getContents
. I would define sequenceI
like this:
sequenceI (x:xs) = do v <- x
vs <- unsafeInterleaveIO $ sequenceI xs
return (v:vs)
On top of this combinator, you can define your own untilM
, that streams. Doing this is left as an excercise to the reader.
Edit (corrected for compilation)
This is a proof-of-concept, untested implementation of untilM:
untilMI f p = do
f' <- f
p' <- p
if p'
then return [f']
else do g' <- unsafeInterleaveIO $ untilMI f p
return (f' : g')
精彩评论