How can I quickly run a function over an array of every possible array of length L with given possible elements?

2023-04-12 07:04 问答作者：

I have a function calc_dG that, for any array corresponding to a short DNA sequence (3 to 15 bases or so), gives me the binding energy of that sequence. Actually,开发者_C百科 it's just an array lookup. nndG is an array of binding energies for adjacent pairs of bases, and thus the binding energies can be calculated with nndG[4*S[:-1]+S[1:]] when using an a,g,c,t -> 0,1,2,3 way of denoting sequences numerically: this means that arrays of many sequences can be calculated at once very quickly in numpy.

I need to find, for a length L, every sequence that both fits some template and results in a binding energy value in a certain range.

This is very easy to do with iterators: just iterate through every possible array input, calculate the binding energy, and then record the arrays that are in the range. This, however, is far too slow when implemented in Python (for length 15 with 4 possible values for each element there are 4**15 possible arrays, etc etc). I could use Weave or some other method of implementing it in C, but I'd prefer to find an array-based solution that is simple and fast.

For example, if every element has the same possible values (eg, [0,1,2,3]), then generating an array of every possible length L 1D array with those values can be done with lambda x: indices(repeat([4],L)).reshape((L,-1)).transpose(); then I can just do calc_dG( result ), and use result[results that are in the desired range] to get the arrays that I want as a final result. This is much faster than using Python iterators, and likely almost as fast, if not faster, than using C iterators. Unfortunately, it doesn't work for arbitrary templates, and for longer sequences, will run out of memory, as it has to store every possible array in memory before calculating values.

Is there some way to do all of this without resorting to C?

If I understand your problem correctly, you are maximizing a function f(i_1, i_2, ..., i_n) over integers in the set {0, 1, 2, 3}.

You can use a combination of iteration and vectorized indexing.

import numpy as np
import itertools

def cartesian_chunked(n, n_items=4, chunk_dim=3):
    if n > chunk_dim:
        p = n - chunk_dim
        q = chunk_dim
        outer = itertools.product(*([range(n_items)] * (n - chunk_dim)))
    else:
        p = 0
        q = n
        def outer_iter():
            yield ()
        outer = outer_iter()

    chunk = np.zeros([n_items**q, n], dtype=int)
    chunk[:,p:] = np.indices(np.repeat([n_items], q)).reshape(q, -1).T
    for seq in outer:
        chunk[:,:p] = seq
        yield chunk

def compute_energy(indices):
    base_energies = np.array([-1, 4, 8, 2.4])
    return (base_energies[indices]).sum(axis=1) 

max_energy = 0
max_config = None

# try out 4**10 ~ 1e6 combinations, in chunks of 4**8
for chunk in cartesian_chunked(n=10, n_items=4, chunk_dim=8):
    energies = compute_energy(chunk)
    j = np.argmax(energies)
    if energies[j] > max_energy:
        max_energy = energies[j]
        max_config = chunk[j].copy() # copy! the chunk is modified

print max_energy
print max_config

继续阅读：numpy python scipy

How can I quickly run a function over an array of every possible array of length L with given possible elements?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？