Potential for optimizing sets for state space enumeration in Haskell based on heap profile?
I wrote a small state space explorer for an exhaustive search problem. Given a start state, a transition relation computes a set of successor states, until all states have been seen (I know that the search space is finite). For my problem at hand, it yields 2*n successor states tops. For all states, I also check if they satisfy some property dlCheck
(here, there's only ONE matching state in the whole state space).
As I keep the Data.Set
of seen states around, I'm expecting big-time memory consumption already. However, the heap profile has me puzzling (ATest contains the code below, the property check and transition are in another module):
If I would guess, I'd say that lower bound on Set
is the size of the already explored state space. I am wondering where the periodic contraction is coming from. Do you spot anything funny in the exploration below? The main call is into the final dlsStreamHist
at the bottom.
Given that the whole program is too large to post, and I have some idea on the performance of the transition relation and the property check, I'm assuming the problem is somewhere here. Am I maybe looking for strict set-operations, very much like foldl'
? It could be that somehow first all successor states are calculated, before comparing them to the seen
set, instead of doing it on the fly.
-- 'test' applies a step to each process in the state (individually), yielding a set of successor states
-- ...or not, if a process blocks. Already seen states are in the first component
-- so that we don't explore them again,
-- blocked configs go into the second, and new successor states go into the third component.
-- We produce a potentially infinite stream of sets of blocked/enabled configurations.
-- This method generates our search space; the transition relation 'step' is in here.
testHistory :: State a => Set a -> Set a -> [(Set a, Set a, Set a)]
testHistory seen ps
| null ps = []
| otherwise = let (ls,rs) = fold (\s lr -> fold (\x (l,r) -> maybe (insert s l,r)
(\c -> (l,insert c r)) x) lr (step s)) (empty,empty) ps
newSeen = seen `union` ps
new = rs `difference` newSeen
in开发者_开发技巧 (newSeen,ls,new):(testHistory newSeen new)
-- Take a program, and run it (to infinity). For each blocked configuration, collect all deadlocks (if any).
-- For infinite streams, you might want to 'nub' the result.
dlsStreamT :: State a => (Set a -> [(Set a, c)]) -> a -> [Set (Process,Lock,Int)]
dlsStreamT testF p = concatMap (Data.Set.toList . (fold (union) empty) . (filter (not . null)) . (Data.Set.map (filter (not . null)))) $ map ((filter (not . null)) . (Data.Set.map dlCheck) . fst) (testF (singleton p))
dlsStreamHist :: State a => a -> [Set (Process,Lock,Int)]
dlsStreamHist = dlsStreamT (map (\(_,x,y)->(x,y)) . testHistory empty)
At a guess, it's having to construct the whole set of new states before it can complete each testHistory step: you can't calculate rs `difference` newSeen
until you've constructed all of newSeen
for instance (for hopefully obvious reasons).
Then I'd guess that the filtering by dlCheck
is lazy, so it's building up a pile of thunks before evaluating them all, leading to the pulsating heap usage. Is there some reason you can't fold the dlCheck
into the state generation so that you can apply it to states as they're generated?
NB, what process algebra are you using? (Not that it's relevant, it just happens to be my field!)
精彩评论