What's a fast way to parse a math operator off of a string of numbers?
Given a string as a form of input (parsed from an input file) which represents a number and a mathematical operator (<, >, <=, >=, !, !=, and a few others), what is a really fast, efficient way to chop off that operator, compare it to a list of valid operators, and then set an "operator" variable to a state (i.e., Enum) representing the identified operator, then return just the number (as a string)?
I'm open to various ideas and implementations. I've tried several (about 6-7) myself, and find I'm not really satisfied with the speed. The fastest so far is a For Each loop that walks my list of "valid operators", and compares that operator's string representation against the chopped off bit from the numeric string. I determine the amount to chop off by the length of each valid operator in the valid list.
Here's a code example of the fastest implementation. Assume input like <378
and a valid ops list of <, >, !
, or >=79
and a valid ops list of <=, >=
:
Friend Function FindMatchingOp(ByVal Haystack As String,
ByVal ValidOps() As <OperatorType>) As String
Dim tmpBit As String, tmpOpName As String, tmpOpLen As Int32
For Each tmpOp As <OperatorType> In ValidOps
tmpOpName = tmpOp.Name
tmpOpLen = tmpOpName.Length
tmpBit = Strings.Left(Haystack, tmpOpLen)
If String.Equals(tmpBit, tmpOpName) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOpLen)
Exit For
End If
Next
Return vbNullString
End Function
Not all of the numeric strings I expect to parse will utilize the same math operators (hence the need for the ValidOps variable). Some might only support < and >, others might do <=, >=, and !. This is why I cannot hardcode the assumption that the operator will开发者_如何学Go be only one character in length, and have to test for both one-or-two character operators. I believe it's these specific string checks that slow my other implementations down.
I've also tried putting ValidOps into things like a Dictionary, HashTable, ListDictionary, and even an Arraylist. The standard array beats all of them every time.
Thoughts?
PS, VB code only, please, in any advice or solutions.
EDIT:
I am going to try and implement a Trie to handle this and see what its performance is. I got the idea from this StackOverflow question. Not going to work for me.
You could somewhat improve your function by changing:
tmpOpLen = tmpOpName.Length
tmpBit = Strings.Left(Haystack, tmpOpLen)
If String.Equals(tmpBit, tmpOpName) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOpLen)
Exit For
End If
to...
If Haystack.StartsWith(tmpOp.Name) Then
<Code to set the correct operator>
Return Haystack.Remove(0, tmpOp.Name.Length)
Exit For
End If
But that is probably going to be marginal. All you'll have is the removal of all of your intermediate strings.
You should use the .NET Regex engine instead of brute-forcing comparisons against a list of operators with that foreach....
A generic way to extract the operator would be:
String Operator = Regex.Match(MathOperation as String, @"(?=\d*\s*)\w{1,2}(?=\s*)").Value;
Hope it helps :)
What you want, in essence, is a parser for a number followed by an operator. You could do several things: 1. Write a good simple ad-hoc parser. 2. Write a simple inefficient ad-hoc parser (simpler and slower than 1) with for example regexes. 3. Have a parser written for you by a parser-generator (I googled http://www.codeproject.com/KB/recipes/TinyPG.aspx for you).
Your regular grammar here is
input := line*
line := number operator
number := digit+
digit := [1-9]
operator := [your|list|of|valid|operator|literals]
A simple solution would be to create a recursive decent parser. A fast solution would be to create a finite state automaton parser. A fast lazy (and thus the best) solution is have a parser-generator create a finite state automaton parser for you.
精彩评论