Traversing and filtering a tree in haskell
I am pretty new to Haskell (still working on totally understanding monads). I have a problem where I have a tree like structure
type Tree = [DataA]
data DataA = DataA1 [DataB]
| DataA2 String
| DataA3 String [DataA]
deriving Show
data DataB = DataB1 [DataA]
| DataB2 String
| DataB3 String [DataB]
deriving Show
What I would like to do is be able to traverse this and generate a new tree with a filter. For example I may want to change all DataB2 in the tree t开发者_高级运维o "foo".
I have seen examples of tree when they are in the same data section, and the constructors are similar.
In the python world, I would just traverse the list, match to whatever I needed, and replace the value.
In Haskell I am guessing that I need to be able to copy my tree, but how do you deal with lists buried in constructors and different data types?
You can use generic programming for this.
One such generic programming library is called Scrap Your Boilerplate. At the very top of your module, enable Scrap Your Boilerplate by writing:
{-# LANGUAGE DeriveDataTypeable #-}
Import module Data.Generics
. Then besides Show
, also derive Typeable
and Data
instances for your datatypes.
Now you can write the function you requested like this:
toFoo :: Data a => a -> a
toFoo = everywhere (mkT step)
where
step (DataA2 _) = DataA2 "foo"
step x = x
That's all you need to do to make this work. For example, when you call toFoo [DataA1 [], DataA2 "hi", DataA3 "yo" []]
, the answer is [DataA1 [],DataA2 "foo",DataA3 "yo" []]
.
Hope this helps!
I don't know a general answer to your question. The data type is quite contrived, and I would probably choose to implement a fold rather than a filter. Here, however, are some filter functions that can update strings in all four positions. I have put the code through the compiler, so it typechecks, but I haven't run it.
type SFilter = String -> String
-- to filter a tree, say how A2, A3, B2, and B3 should be changed
type Filter tree = SFilter -> SFilter -> SFilter -> SFilter -> (tree -> tree)
afilter :: Filter DataA
bfilter :: Filter DataB
tfilter :: Filter Tree
tfilter a2 a3 b2 b3 = map (afilter a2 a3 b2 b3)
afilter a2 a3 b2 b3 = fil
where fil (DataA1 bs) = DataA1 $ map (bfilter a2 a3 b2 b3) bs
fil (DataA2 s) = DataA2 (a2 s)
fil (DataA3 s as) = DataA3 (a3 s) (map fil as)
bfilter a2 a3 b2 b3 = fil
where fil (DataB1 as) = DataB1 $ map (afilter a2 a3 b2 b3) as
fil (DataB2 s) = DataB2 (b2 s)
fil (DataB3 s bs) = DataB3 (b3 s) (map fil bs)
You want to traverse the whole data structure and change some items here and there. This is usually done by a function that takes the data structure as a parameter and returns the new, changed version of the structure.
For every case of input this function defines how the new value that is returned should look like.
The basic function that modifies a Tree
(which is just a list of DataA
values)
probably should just return a new list of modified values. If we defer those
modifications of the values to a modifyA
function, the main modification
function looks like this:
-- # function to change a |Tree|
mutate :: Tree -> Tree
mutate as = map mutateA as
-- # (The |map| function applies the |mutateA| function to every
-- # element of |as|, creating a list of all the return values)
Now the mutateA
function needs to be defined to change all possible DataA
values, and
best it is accompanied by a mutateB
function that handles DataB
values.
These functions look at the different possible cases of values and return the appropriate new values:
-- # function to change |DataA| items
mutateA :: DataA -> DataA
-- # A |DataA1| is a |DataA1| with modified values
mutateA (DataA1 bs) = DataA1 (map mutateB bs)
-- # A |DataA3| is a |DataA3| with modified values
mutateA (DataA3 s as) = DataA3 s (map mutateA as)
-- # In the remaining case(s) the value stays the same
mutateA d = d
-- # function to change |DataB| items
mutateB :: DataB -> DataB
mutateB (DataB1 as) = DataB1 (map mutateA as)
mutateB (DataB3 s bs) = DataB3 s (map mutateB bs)
-- # Here comes a real change
mutateB (DataB2 _) = DataB2 "foo"
This way for every element in the tree a new element is computed, where all the DataB2
values anywhere in the tree are replaced by "foo".
It's relatively verbose because you have five different cases that contain a list
of values that needs to be walked through, but that is not specific to Haskell. In
an imperative language you would usually have five for loops in place of the five
calls to map
.
Maybe you could simplify your data structure to reduce this "overhead". This
of course depends on your actual use case, but maybe, for example, you don't need
the Data2
cases:
Is there a difference between DataA2 "abc"
and DataA3 "abc" []
?
You may want to take a look at the multirec library for working with mutually recursive data types. I've not used it, but from what you've described it sounds like it's aimed at precisely the kind of problem that you're working with. It uses generic programming like the other answers here have suggested, but might save you the time of implementing everything yourself.
精彩评论