Optimize parts extraction
I have this specific function to extract parts of a list in the form: Give[list, elem]
returns the part of list that corresponds to the position of elem in a global $Reference
variable (if defined). I use this function heavily throughout my code, so I decided to optimize it. This is where I managed to get so far, but frankly, I have no idea how to advance.
ClearAll[Give, $Reference, set];
Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";
give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];
Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
ref = Reference /. {opts} /. Options@Give;
Which[
Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
True, Fold[give[#1, #2, ref] &, list, {elem}]
]];
In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}
In[109]:= Give[set, "B", "A开发者_JAVA技巧"] (* return entry at specified row & column *)
Out[109]= 4
In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}
I've decided to drop distinct signature function calls, as the list version might call the non-list version, which means that error handling has to be done multiple times (for each element in the list). Sadly, the error handling cannot be discarded. If the improved version is more robust (can e.g. handle more dimensions), that's not a problem, however the examples above will suffice.
In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)
Out[139]= 0.031
In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)
Out[138]= 0.499
I'm sure this is not efficient code, so feel free to improve it. Any help is appreciated, even if it trims off only a few nanoseconds.
The main efficiency problem for large lists seems to come from mapping Pick
. This can be avoided if you replace the corresponding definition for give
with this one:
give[list_, elem_List, ref_] :=
list[[elem /. Dispatch[Thread[ref -> Range[Length[ref]]]]]];
Here is my test code:
In[114]:=
Block[{$Reference = Range[100000],set = Range[100000]^2,rnd,ftiming,stiming},
rnd = RandomSample[$Reference,10000];
ftiming = First@Timing[res1 = Give[set,rnd]];
Block[{give},
give[list_,elem_List,ref_]:=list[[elem/.Dispatch[Thread[ref->Range[Length[ref]]]]]];
give[list_,elem_,ref_]:=First@Pick[list,ref,elem];
stiming = First@Timing[res2 = Give[set,rnd]];];
{ftiming,stiming,res1===res2}
]
Out[114]= {1.703,0.188,True}
You get 10 - fold speed increase here, for this use case. I did not test the 2D one, but would guess it should help there too.
EDIT
You could further improve performance by caching the dispatched table for $Reference
(Dispatch[Thread[ref->Range[Length[$Reference]]]
) once at the start in the body of Give
, and then pass it to give
(either explicitly or by making give
an inner function - through Module
variables - which would refer to it), so that you don't have to recompute it in case when you call give
several times through Fold
. You can also do that conditionally, say of you have large lists of elements in elem
, to justify the time needed to create the dispatch table.
Here is another solution for this problem based on a problem I had for indexing real numbers. It uses lazy evaluation to display an error message if needed (a trick I learned on this site! Thanks to all for your dedication, it's always a pleasure to learn new stuff here!)
ListToIndexFunction[list_List,precision_:0.00001]:=
Module[{numbersToIndexFunction},
numbersToIndexFunction::indexNotFound="Index of `1` not found.";
MapThread[(numbersToIndexFunction[#1]=#2)&,{Round[list,precision],Range[Length@list]}];
numbersToIndexFunction[x_]/;(Message[numbersToIndexFunction::indexNotFound,x];False):=Null;
numbersToIndexFunction[Round[#,precision]]&
];
Test:
f=ListToIndexFunction[{1.23,2.45666666666,3}]
f[2.456666]
f[2.456665]
This is similar to Leonid's answer, but in my own style.
I use the same Dispatch
table, and I recommend making this as external as possible. To this end, I suggest a new symbol $Rules
that is updated whenever $Reference
is changed. For example:
$Reference = RandomSample["A"~CharacterRange~"Z"];
$Rules = Dispatch@Thread[$Reference -> Range@Length@$Reference];
This can be made automatic for convenience, if it is done frequently (ask).
Aside from this, my complete code:
ClearAll[Give, $Reference, Reference, $Rules];
Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";
Options[Give] = {Reference :> $Reference};
Give[list_List, elem___, opts : OptionsPattern[]] :=
Module[{ref, pos, rls},
ref = OptionValue[Reference];
rls = If[{opts} == {}, $Rules, Dispatch@Thread[ref -> Range@Length@ref]];
Which[
ref === {} || Head@ref =!= List,
Message[Give::noref]; {},
Complement[Union@Flatten@{elem}, ref] =!= {},
Message[Give::noelem, elem, ref]; {},
Length@{elem} > Depth@list - 1,
Message[Give::nodepth, {elem}, Depth@list]; {},
True,
list[[##]] & @@ ({elem} /. rls)
]
];
This is what I got after letting this piece of code rest for 2 years. It memoizes the dispatch table for the given reference set, and uses the Part
-type syntax. I got rid of all the error messages and also dropped the global $Reference
symbol. Very un-Mathematica-like and I never liked it.
dispatch[ref_] := dispatch@ref = (Dispatch@Thread[ref -> Range@Length@ref]);
give[list_, elem__, ref_] := list[[Sequence @@ ({elem} /. dispatch@ref)]];
Memoization ensures that the dispatch table for a given ref
is only calculated once. Maintaining multiple dispatch tables in memory is not a problem as these are usually small.
ref = Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
give[set, "B", ref] (* ==> {4, 5, 6} *)
give[set, "B", "A", ref] (* ==> 4 *)
give[set, {"B", "A"}, ref] (* ==> {{4, 5, 6}, {1, 2, 3}} *)
Timing:
n = 20000;
{
First@Timing[give[set, #, ref] & /@ RandomChoice[ref, n]],
First@Timing[give[set, RandomChoice[ref, n], ref]],
First@Timing[Table[give[set, Sequence @@ RandomChoice[ref, 2], ref], {n}]]
}
{0.140401, 0., 0.202801}
Compare this to the timings of the original function:
{
First@Timing[Give[set, #] & /@ RandomChoice[ref, n]],
First@Timing[Give[set, RandomChoice[ref, n]]],
First@Timing[Table[Give[set, Sequence @@ RandomChoice[ref, 2]], {n}]]
}
{0.780005, 0.015600, 1.029607}
精彩评论