Where might I begin on this optimization problem?
I have a simple program which reads a bunch of things from the filesystem, filters the results , and prints them. This simple program implements a domain specific language to make selection easier. This DSL "compiles" down into an execution plan that looks like this (Input was C:\Windows\System32\* OR -md5"ABCDEFG" OR -tf
):
Index Success Failure Description
0 S 1 File Matches C:\Windows\System32\*
1 S 2 File MD5 Matches ABCDEFG
2 S F File is file. (Not directory)
The filter is applied to the given file, and if it succeeds, the index pointer jumps to the index indicated in the success field, and if it fails, the index pointer jumps to the number indicated in the failure field. "S" means that the file passes the filter, F means that the file should be rejected.
Of course, a filter based upon a simple file attribute (!FILE_ATTRIBUTE_DIRECTORY) check is much faster than a check based upon the MD5 of the file, which requires opening and performing the actual hash of the file. Each filter "opcode" has a time class associated with it; MD5 gets a high timing number, ISFILE gets a low timing number.
I would like to reorder this execution plan so that opcodes that take a long time are executed as rarely as possible. For the above plan, that would mean it would have to be:
Index Success Failure Description
0 S 1 File is file. (Not directory)
1 S 2 File Matches C:\Windows\System32\*
2 S F File MD5 Matches ABCDEFG
According to the "Dragon Book", picking the best order of execution for three address code is an NP-Complete problem (At least according to page 511 of the second edition of that text), but in that case they are talking about register allocation and 开发者_开发技巧other issues of the machine. In my case, the actual "intermediate code" is much simpler. I'm wondering of a scheme exists that would allow me to reorder the source IL into the optimal execution plan.
Here is another example:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }Parsed to:
Index Success Failure Description
0 1 2 File Matches C:\Windows\Inf\*
1 S 2 File is a Portable Executable
2 3 F File is file. (Not directory)
3 F S File Matches C:\Windows\System32\Drivers\*
which is optimally:
Index Success Failure Description
0 1 2 File is file. (Not directory)
1 2 S File Matches C:\Windows\System32\Drivers\*
2 3 F File Matches C:\Windows\Inf\*
3 S F File is a Portable Executable
It sounds like it might be easier to pick an optimal order before compiling down to your opcodes. If you have a parse tree, and it is as "flat" as possible, then you can assign a score to each node and then sort each node's children by the lowest total score first.
For example:
{ C:\Windows\Inf* AND -tp } OR { -tf AND NOT C:\Windows\System32\Drivers* }
1 2 3 4
OR
/ \
AND AND
/ \ / \
1 2 3 4
You could sort the AND nodes (1, 2) and (3, 4) by the lowest score and then assign that score to each node. Then sort the children of the OR node by the lowest score of their children.
Since AND and OR are commutative, this sorting operation won't change the meaning of your overall expression.
@Greg Hewgill is right, this is easier to perform on the AST than on the Intermediate code. As you want to work on the Intermediate code, the first goal is to transform it into a dependency tree (which will look like the AST /shrug).
Start with the leaves - and it is probably easiest if you use negative-predicates for NOT.
Index Success Failure Description
0 1 2 File Matches C:\Windows\Inf\*
1 S 2 File is a Portable Executable
2 3 F File is file. (Not directory)
3 F S File Matches C:\Windows\System32\Drivers\*
Extract Leaf (anything with both children as S, F, or an extracted Node; insert NOT where required; Replace all references to Leaf with reference to parent node of leaf)
Index Success Failure Description
0 1 2 File Matches C:\Windows\Inf\*
1 S 2 File is a Portable Executable
2 L1 F File is file. (Not directory)
L1=NOT(cost(child))
|
Pred(cost(PATH))
Extract Node (If Success points to Extracted Node use conjunction to join; Failure uses disjunction; Replace all references to Node with reference to resulting root of tree containing Node).
Index Success Failure Description
0 1 L3 File Matches C:\Windows\Inf\*
1 S L3 File is a Portable Executable
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
/ \
L1=NOT(cost(child)) L2=IS(cost(child))
| |
3=Pred(cost(PATH)) 2=Pred(cost(ISFILE))
Extract Node
Index Success Failure Description
0 L5 L3 File Matches C:\Windows\Inf\*
L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
/ \
| L4=IS(cost(child))
| |
| 1=Pred(cost(PORT_EXE))
|
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
/ \
L1=NOT(cost(child)) L2=IS(cost(child))
| |
3=Pred(cost(PATH)) 2=Pred(cost(ISFILE))
Extract Node (In the case where Success and Failure both refer to Nodes, you will have to inject the Node into the tree by pattern matching on the root of the sub-tree defined by the Node)
If root is OR, invert predicate if necessary to ensure reference is Success and inject as conjunction with child not referenced by Failure.
If root is AND, invert predicate if necessary to ensure reference is Failure and inject as disjunction with child root referenced by Success.
Resulting in:
L5=OR L3 L4 (cost(Min(L3,L4) + (1.0 - Selectivity(Min(L3,L4))) * Max(L3,L4)))
/ \
| L4=AND(cost(as for L3))
| / \
| L6=IS(cost(child)) L7=IS(cost(child))
| | |
| 1=Pred(cost(PORT_EXE)) 0=Pred(cost(PATH))
|
L3=AND L1 L2 (cost(Min(L1,L2) + Selectivity(Min(L1,L2)) * Max(L1,L2)))
/ \
L1=NOT(cost(child)) L2=IS(cost(child))
| |
3=Pred(cost(PATH)) 2=Pred(cost(ISFILE))
精彩评论