Algorithm - How to delete duplicate elements in a Haskell list
I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list. An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the l开发者_JAVA技巧ist).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy
, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy
is the function you are looking for, and can be found in Data.List.
That said, you want a function rem
which accepts a list xs
and a function f :: a -> a -> Bool
(on which elements are compared for removal from xs
). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = []
and rem f xs = []
, since the result of removing all duplicate elements from []
is []
:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as)
. Let as'
be the list obtained by removing all elements a'
such that f a a' = True
from the list as
. This is simply the function filter (\a' -> not $ f a a')
applied to the list as
. Them rem f (a:as)
is the result of recursively calling rem f
on as'
, that is, a : rem f as'
:
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f
be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy
with two reverse
's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy
is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse
's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map
(O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy
implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub
should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h
is not equal to head t
but is instead equal to the 3rd element of t
? You need to write an algorithm that compares h
with every element of t
, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]
精彩评论