F# String Pattern-Matching with Wildcards
As part of a project I have assigned myself as a way of improving my knowledge of F# and functional programming in general, I am attempting to write a string pattern-matching algorithm from scratch without using any loops or variables (or regular expressions, or String.Replace and friends). As this is purely a learning project, I'm not interested in the best possible way to do it, just the best functional way to do it.
I'm trying to write a function that accepts a wildcard character, a pattern string, and an input string as parameters. If the pattern does not match the input, the function returns None
. If the pattern does match the input, the function returns Some(str)
where str
is whatever part of the input string matched any wildcards that might have been present in the pattern string.
I have this mostly working, and I'll include the code in a moment. I've written a generic pattern-matching function that works on any generic list of anything that supports equality, and then a helper function that takes strings and passes lists of characters to the generic function. This all works, except for one thing: the support for开发者_如何转开发 multiple wildcards in the pattern string isn't very good - it takes the matches for each wildcard and concatenates them together into a single string in the output.
For example:
> strMatch '*' "foo" "bar";;
val it : string option = None
> strMatch '*' "test" "test";;
val it : string option = Some ""
> strMatch '*' "functional programming is *" "functional programming is fun";;
val it : string option = Some "fun"
> strMatch '*' "* and *" "you and me";;
val it : string option = Some "youme"
It's the last one that I'm trying to fix. Ideally I'd like to return a list of strings rather than a single string, with each element in the list being the string that matched one wildcard. Failing that, I can probably make do with a version that returns only the match for the first wildcard - it's the concatenated values from both wildcards that I need to get rid of. I'm just not quite sure how to approach it.
So if anyone can suggest how I can group my return values by which wildcard they matched, I would be grateful. I'm also interested in any other improvements to my code that you might want to suggest.
let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list) : 'a list option =
let singleMatch p i =
match (p, i) with
| phd :: ptl, ihd :: itl ->
if phd = wildcard then
match doMatch wildcard ptl itl with
| None -> None
| Some x -> Some(ihd :: x)
else None
| _ -> None
let longerMatch p i =
match (p, i) with
| phd :: ptl, ihd :: itl ->
if phd = wildcard then
match doMatch wildcard p itl with
| None -> None
| Some x -> Some(ihd :: x)
else None
| _ -> None
match (pat, input) with
| [], [] -> Some([])
| [], _::_ -> None
| _::_, [] -> None
| phd :: ptl, ihd :: itl ->
if phd <> wildcard then
if phd = ihd then doMatch wildcard ptl itl
else None
else
match singleMatch pat input with
| Some x -> Some(x)
| None -> longerMatch pat input
let strMatch (wildcard:char) (pat:string) (input:string) =
match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
| None -> None
| Some x -> Some(new string(Array.ofList x))
You've probably guessed, but this is part of an Eliza chat-bot implementation in F#.
From a design point of view, I like the idea of returning an
'a list option
where e.g.
None // it did not match
Some[] // matched, input had 0 wildcards
Some["foo";"bar"] // matched, input has 2 wildcards, "foo" matched 1st, "bar" 2nd
That is, just guarantee that when 'Some' is returned, the length of the list equals the number of wildcards, and the elements of the list are the matches in order. This seems to me to be straightforward to implement as well as reasonable for client code to use/consume.
(I am unclear if there is any deeper question in your long post.)
Looks like fun stuff!
EDIT
Here's some updated code. My gut tells me it's not all correct, but it at least works on your examples. The key is to use
'a list list option
since 'a is a character, an 'a list is like a string, and we want a list of strings. singleMatch starts a new list of strings, whereas longerMatch is consing onto the front of the current string.
let rec doMatch (wildcard:'a) (pat:'a list) (input:'a list)
: 'a list list option =
let singleMatch p i =
match (p, i) with
| phd :: ptl, ihd :: itl ->
if phd = wildcard then
match doMatch wildcard ptl itl with
| None -> None
| Some xs -> Some([ihd]::xs)
else None
| _ -> None
let longerMatch p i =
match (p, i) with
| phd :: ptl, ihd :: itl ->
if phd = wildcard then
match doMatch wildcard p itl with
| None -> None
| Some ([]) -> Some([[ihd]])
| Some (x::xs) -> Some((ihd :: x)::xs)
else None
| _ -> None
match (pat, input) with
| [], [] -> Some([])
| [], _::_ -> None
| _::_, [] -> None
| phd :: ptl, ihd :: itl ->
if phd <> wildcard then
if phd = ihd then doMatch wildcard ptl itl
else None
else
match singleMatch pat input with
| Some x -> Some(x)
| None -> longerMatch pat input
let strMatch (wildcard:char) (pat:string) (input:string) =
match doMatch wildcard (List.ofSeq pat) (List.ofSeq input) with
| None -> None
| Some x -> Some(x|>List.map (fun chList -> new string(Array.ofList chList)))
printfn "%A" (strMatch '*' "foo" "bar")
printfn "%A" (strMatch '*' "test" "test")
printfn "%A" (strMatch '*' "functional programming is *"
"functional programming is fun")
printfn "%A" (strMatch '*' "* and *" "you and me")
精彩评论