Optimize parts extraction

2023-03-02 16:36 问答作者：

I have this specific function to extract parts of a list in the form: Give[list, elem] returns the part of list that corresponds to the position of elem in a global $Reference variable (if defined). I use this function heavily throughout my code, so I decided to optimize it. This is where I managed to get so far, but frankly, I have no idea how to advance.

ClearAll[Give, $Reference, set];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];

Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
   ref = Reference /. {opts} /. Options@Give;
   Which[
      Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
      Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
      Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
      True, Fold[give[#1, #2, ref] &, list, {elem}]
]];



In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}

In[109]:= Give[set, "B", "A开发者_JAVA技巧"] (* return entry at specified row & column *)
Out[109]= 4

In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}

I've decided to drop distinct signature function calls, as the list version might call the non-list version, which means that error handling has to be done multiple times (for each element in the list). Sadly, the error handling cannot be discarded. If the improved version is more robust (can e.g. handle more dimensions), that's not a problem, however the examples above will suffice.

In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)

Out[139]= 0.031

In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)

Out[138]= 0.499

I'm sure this is not efficient code, so feel free to improve it. Any help is appreciated, even if it trims off only a few nanoseconds.

The main efficiency problem for large lists seems to come from mapping Pick. This can be avoided if you replace the corresponding definition for give with this one:

give[list_, elem_List, ref_] := 
    list[[elem /. Dispatch[Thread[ref -> Range[Length[ref]]]]]];

Here is my test code:

In[114]:= 
  Block[{$Reference = Range[100000],set = Range[100000]^2,rnd,ftiming,stiming},
      rnd = RandomSample[$Reference,10000];
      ftiming = First@Timing[res1 = Give[set,rnd]];
      Block[{give},
        give[list_,elem_List,ref_]:=list[[elem/.Dispatch[Thread[ref->Range[Length[ref]]]]]];
        give[list_,elem_,ref_]:=First@Pick[list,ref,elem];
        stiming = First@Timing[res2 = Give[set,rnd]];];
   {ftiming,stiming,res1===res2}
]

Out[114]= {1.703,0.188,True}

You get 10 - fold speed increase here, for this use case. I did not test the 2D one, but would guess it should help there too.

EDIT

You could further improve performance by caching the dispatched table for $Reference (Dispatch[Thread[ref->Range[Length[$Reference]]]) once at the start in the body of Give, and then pass it to give (either explicitly or by making give an inner function - through Module variables - which would refer to it), so that you don't have to recompute it in case when you call give several times through Fold. You can also do that conditionally, say of you have large lists of elements in elem, to justify the time needed to create the dispatch table.

Here is another solution for this problem based on a problem I had for indexing real numbers. It uses lazy evaluation to display an error message if needed (a trick I learned on this site! Thanks to all for your dedication, it's always a pleasure to learn new stuff here!)

ListToIndexFunction[list_List,precision_:0.00001]:=
   Module[{numbersToIndexFunction},

      numbersToIndexFunction::indexNotFound="Index of `1` not found.";

      MapThread[(numbersToIndexFunction[#1]=#2)&,{Round[list,precision],Range[Length@list]}];
      numbersToIndexFunction[x_]/;(Message[numbersToIndexFunction::indexNotFound,x];False):=Null;

      numbersToIndexFunction[Round[#,precision]]&
   ];

Test: 
f=ListToIndexFunction[{1.23,2.45666666666,3}]
f[2.456666]
f[2.456665]

This is similar to Leonid's answer, but in my own style.

I use the same Dispatch table, and I recommend making this as external as possible. To this end, I suggest a new symbol $Rules that is updated whenever $Reference is changed. For example:

$Reference = RandomSample["A"~CharacterRange~"Z"];

$Rules = Dispatch@Thread[$Reference -> Range@Length@$Reference];

This can be made automatic for convenience, if it is done frequently (ask).

Aside from this, my complete code:

ClearAll[Give, $Reference, Reference, $Rules];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

Options[Give] = {Reference :> $Reference};

Give[list_List, elem___, opts : OptionsPattern[]] := 
  Module[{ref, pos, rls},
   ref = OptionValue[Reference];
   rls = If[{opts} == {}, $Rules, Dispatch@Thread[ref -> Range@Length@ref]];
   Which[
    ref === {} || Head@ref =!= List,
        Message[Give::noref]; {},
    Complement[Union@Flatten@{elem}, ref] =!= {},
        Message[Give::noelem, elem, ref]; {},
    Length@{elem} > Depth@list - 1, 
        Message[Give::nodepth, {elem}, Depth@list]; {},
    True,
        list[[##]] & @@ ({elem} /. rls)
   ]
  ];

This is what I got after letting this piece of code rest for 2 years. It memoizes the dispatch table for the given reference set, and uses the Part-type syntax. I got rid of all the error messages and also dropped the global $Reference symbol. Very un-Mathematica-like and I never liked it.

dispatch[ref_] := dispatch@ref = (Dispatch@Thread[ref -> Range@Length@ref]);
give[list_, elem__, ref_] := list[[Sequence @@ ({elem} /. dispatch@ref)]];

Memoization ensures that the dispatch table for a given ref is only calculated once. Maintaining multiple dispatch tables in memory is not a problem as these are usually small.

ref = Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

give[set, "B", ref]          (* ==> {4, 5, 6}              *)
give[set, "B", "A", ref]     (* ==> 4                      *)
give[set, {"B", "A"}, ref]   (* ==> {{4, 5, 6}, {1, 2, 3}} *)

Timing:

n = 20000;
{
First@Timing[give[set, #, ref] & /@ RandomChoice[ref, n]],
First@Timing[give[set, RandomChoice[ref, n], ref]],
First@Timing[Table[give[set, Sequence @@ RandomChoice[ref, 2], ref], {n}]]
}

{0.140401, 0., 0.202801}

Compare this to the timings of the original function:

{
First@Timing[Give[set, #] & /@ RandomChoice[ref, n]],
First@Timing[Give[set, RandomChoice[ref, n]]],
First@Timing[Table[Give[set, Sequence @@ RandomChoice[ref, 2]], {n}]]
}

{0.780005, 0.015600, 1.029607}

继续阅读：optimization wolfram-mathematica

Optimize parts extraction

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？