开发者

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char Interested purely functional implementation开发者_Go百科 haskell or F# preffered.


A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)

The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.


A simple F# solution:

let f (s: string) =
  let n = Map(Seq.countBy id s)
  Seq.find (fun c -> n.[c] = 1) s


Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.

open System
open System.IO
open System.Collections.Generic

let Solve (str : string) =
    let arrStr = str.ToCharArray()
    let sorted = Array.sort arrStr
    let len = str.Length - 1

    let rec Inner i = 
        if i = len + 1 then
            '-'
        else
            let index = Array.BinarySearch(sorted, arrStr.[i])
            if index = 0 && sorted.[index+1] <> sorted.[index] then 
                arrStr.[i]
            elif index = len && sorted.[index-1] <> sorted.[index] then 
                arrStr.[i]
            elif index > 0 && index < len && 
                 sorted.[index+1] <> sorted.[index] && 
                 sorted.[index-1] <> sorted.[index] then 
                arrStr.[i]
            else
                Inner (i + 1)

    Inner 0

let _ =
    printfn "%c" (Solve "abcdefabcf")

A - means all characters are repeated.

Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.


Here's a bit longish solution, but guaranteed to be worst-case O(n log n):

import List
import Data.Ord.comparing

sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)

index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]

dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
                          dropRepeated $ dropWhile ((==(fst x)).fst) xs
                    | otherwise =
                          x:(dropRepeated xs)

nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index

firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs

The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.

The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).

Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...


An alternate Haskell O(n log n) solution using Data.Map and no sorting:

module NonRepeat (
    firstNonRepeat
    )
    where

import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)

data Occurance = Occ { first :: Int, count :: Int }
    deriving (Eq, Ord)

note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)

combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)

firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
    where occurances = toList . fromListWith combine . zipWith note [0..]
          findMinimum = safeMinimum . filter ((== 1).count.snd)
          safeMinimum [] = Nothing
          safeMinimum xs = Just $ minimumBy (comparing snd) xs


let firstNonRepeating (str:string) =
    let rec inner i cMap =
        if i = str.Length then
            cMap 
            |> Map.filter (fun c (count, index) -> count = 1) 
            |> Map.toSeq 
            |> Seq.minBy (fun (c, (count, index)) -> index)
            |> fst      
        else
            let c = str.[i]
            let value = if cMap.ContainsKey c then 
                            let (count, index) = cMap.[c]
                            (count + 1, index)
                        else 
                            (1, i)
            let cMap = cMap.Add(c, value)
            inner (i + 1) cMap 
    inner 0 (Map.empty)

Here is a simpler version that sacrifices speed.

let firstNonRepeating (str:string) =
    let (c, count) = str 
                     |> Seq.countBy (fun c -> c) 
                     |> Seq.minBy (fun (c, count) -> count)
    if count = 1 then Some c else None


How about something like this:

let firstNonRepeat s =
  let repeats = 
    ((Set.empty, Set.empty), s)
    ||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
    |> snd
  s
  |> Seq.tryFind (fun c -> not (Set.contains c repeats))


This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):

static char FstNonRepeatedChar(string s)
{
    return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜