Best data structure for this operations

2023-01-23 06:45 问答作者：

I'm trying to find a better approach that my current one to manage the current state vector of a continuous Markov chain. The state vector used stores pairs of (state, proability) where probability is a float.

The piece of the algorithm that needs to be optimized does the following operations:

every iteration starts with a current state vector
computes the reachable states for every current one in vector and store all of them in a temporary list together with the probability of going there
for every element in this new list it calculates the new state vector by iterating over the possible transitions (mind that there can be many transitions which lead to the same state but found from different source states)

this is actually done by using hashtables that have as keys the states and as values the probabilities.

So basically to build up the new vector, for every transition, the normalized value is calculated, then the state in the vector is retrieved with a get, the probability of current transition is added and then the result is stored back.

This approach seemed to work so far but I'm trying to cope with systems that lead in very large space vectors (500k-1mil states) so, although hashtable give a constant complexity to store and retrieve, iterations starts slowing down really a lot.

This because for example, we start from a vector that has 100k states, for each one we computes the reachable states (which usually are > 1) so that we obtain a transition list of 100k*(avg reachable states). Then every transition is gone through to compute the new probability vector.

Unfortunately I need to go through the whole reachable list to find the normalizing value without actually computing the next vecto but in any case, a开发者_运维问答s I saw through profiling, this is not the bottleneck of the algorithm. The bottleneck is present when every transition is computed.

This is the function used to compute the transition list from the current vector (pi):

HTable.fold (fun s p l ->
  if check s f2 then (0., s, p, [s, 1.0]) :: l
  else if not (check s f1) then (0., s, p, [s, 1.0]) :: l
  else
    let ts = P.rnext s in                         
    if List.length ts = 0 then (0., s, p, [s, 1.0]) :: l
    else
      let lm = List.fold_left (fun a (s,f) -> f +. a) 0. ts in
      (lm, s, p, ts) :: l) pi []

And this is the function that computes the new pi by going through the transition list and by normalizing them all:

let update_pi s v = 
  try
    let t = HTable.find pi s in
    HTable.replace pi s (v +. t)
  with Not_found -> HTable.add pi s v
in
  HTable.clear pi;
  List.iter (fun (llm, s, p, ts) ->
    if llm = 0. then
      update_pi s p
    else begin
      List.iter (fun (ss, pp) -> 
        update_pi ss (p *. (pp /. lm))
      ) ts;
      if llm < lm then update_pi s (p *. (1. -. (llm /. lm)))
    end 
  ) u;

I should find a data structures which is better suited for the operations I'm doing, the problem is that with current approach I have to do a get and a set for every single transition but doing so many operations over hashtables kills performance since the constant cost is quite expensive.

It cannot hurt to replace the linear time if List.length ts = 0 by the constant time if ts = [], although I doubt this is going to solve your performance problem.

Your algorithm sound a little like multiplying a matrix by a vector to get a new vector. This is usually sped up by blocking. But here, the representation in hashtables may be costing you locality. Would it be possible, once and for all, to number all the states, and then use arrays instead of hashtables? Note that with arbitrary transitions, the destination states would still not be local, but it may be an improvement (if only because accessing an array is more direct than accessing a hashtable).

继续阅读：data-structures ocaml optimization

Best data structure for this operations

更多精彩内容

精彩评论

最新问答

第一次出国飞行流程+注意事项？

再生油（关于再生油的介绍）？

东莞科技进修学院（关于东莞科技进修学院的介绍）？

均为镇政府人员平均年龄不超30？

手机msn在哪里下载（其实很简单）？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？