Algorithm - How to delete duplicate elements in a Haskell list

2023-01-31 07:11 问答作者：

I'm having a problem creating an function similar to the nub function.

I need this func to remove duplicated elements form a list. An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the l开发者_JAVA技巧ist).

type Regist = [name,email,,...,date]
type ListRe = [Regist]

rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
            | otherwise = h : rmDup t

isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e

The problem is that the function doesn't delete duplicated elements unless they are together in the list.

Just use nubBy, and specify an equality function that compares things the way you want.

And I guess reverse the list a couple of times if you want to keep the last element instead of the first.

Slightly doctored version of your original code to make it run:

type Regist = [String]
type ListRe = [Regist]

rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)

Result:

*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]

Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.

That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.

In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:

rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []

In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':

rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as

Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).

While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation). For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):

import Data.List
import Data.Function
import qualified Data.Set as S

newtype Regis i e = Regis { toTuple :: (i,[e]) }

selector (Regis (_,(_:a:_))) = a 

instance Eq e => Eq (Regis i e) where
    (==) = (==) `on` selector

instance Ord e => Ord (Regis i e) where
    compare = compare `on` selector

rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
    where
      set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)

While nubBy implementation is definitely much simpler:

rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs

on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :

input = take 10000000 $ map (take 10) $ permutations [1..]

test1 = rmNub input
test2 = rmSet input

Not sure about the nature of the author's data though (the real data might change the picture).

(Assuming you want to figure out an answer, not just call a library function that does this job for you.)

You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.

Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.

I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.

removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)

Examples:

*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]

*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]

继续阅读：haskell

Algorithm - How to delete duplicate elements in a Haskell list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？