fastest way to arrange data in python numpy based on list

2023-02-20 09:48 问答作者：

I have problem in arrange data in numpy example a have list of data range :

numpy.array([1,3,5,4,6])

and I have data :

numpy.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])

and I ne开发者_开发知识库ed the data to be arrange to

numpy.array([

[1,9999,9999,9999,9999,9999,9999]

[2,3,4,9999,9999,9999]

[5,6,7,8,9,9999]

[10,11,12,13,9999,9999]

[14,15,16,17,18,19]

])

I thought it's a little similiar with diag / diagonal / trace functionality.

I usually done the job by using basic iteration...does numpy has this functionality so it could perform faster??

Here are some ways to arrange the data:

from numpy import arange, array, ones, r_, zeros
from numpy.random import randint

def gen_tst(m, n):
    a= randint(1, n, m)
    b, c= arange(a.sum()), ones((m, n), dtype= int)* 999
    return a, b, c

def basic_1(a, b, c):
    # some assumed basic iteration based
    n= 0
    for k in xrange(len(a)):
        m= a[k]
        c[k, :m], n= b[n: n+ m], n+ m

def advanced_1(a, b, c):
    # based on Svens answer
    cum_a= r_[0, a.cumsum()]
    i= arange(len(a)).repeat(a)
    j= arange(cum_a[-1])- cum_a[:-1].repeat(a)
    c[i, j]= b

def advanced_2(a, b, c):
    # other loopless version
    c[arange(c.shape[1])+ zeros((len(a), 1), dtype= int)< a[:, None]]= b

And some timings:

In []: m, n= 10, 100
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.531
In []: %timeit advanced_1(a, b, c)
10000 loops, best of 3: 99.2 us per loop
In []: %timeit advanced_2(a, b, c)
10000 loops, best of 3: 68 us per loop
In []: %timeit basic_1(a, b, c)
10000 loops, best of 3: 47.1 us per loop

In []: m, n= 50, 500
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.455
In []: %timeit advanced_1(a, b, c)
1000 loops, best of 3: 1.03 ms per loop
In []: %timeit advanced_2(a, b, c)
1000 loops, best of 3: 1.06 ms per loop
In []: %timeit basic_1(a, b, c)
1000 loops, best of 3: 227 us per loop

In []: m, n= 250, 2500
In []: a, b, c= gen_tst(m, n)
In []: 1.* a.sum()/ (m* n)
Out[]: 0.486
In []: %timeit advanced_1(a, b, c)
10 loops, best of 3: 30.4 ms per loop
In []: %timeit advanced_2(a, b, c)
10 loops, best of 3: 32.4 ms per loop
In []: %timeit basic_1(a, b, c)
1000 loops, best of 3: 2 ms per loop

So the basic iteration seems to be quite efficient.

Update:
Surely the performance of the basic iteration based implementation can still be further improved on. As a starting point suggestion; consider for example this (basic iteration based on reduced addition):

def basic_2(a, b, c):
    n= 0
    for k, m in enumerate(a):
        nm= n+ m
        c[k, :m], n= b[n: nm], nm

Here is how to do this without any Python loop using advanced indexing:

r = numpy.array([1,3,5,4,6])
data = numpy.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])

result = numpy.empty((len(r), r.max()), data.dtype)
result.fill(9999)
cum_r = numpy.r_[0, r.cumsum()]
i = numpy.arange(len(r)).repeat(r)
j = numpy.arange(cum_r[-1]) - cum_r[:-1].repeat(r)
result[i, j] = data
print result

prints

[[   1 9999 9999 9999 9999 9999]
 [   2    3    4 9999 9999 9999]
 [   5    6    7    8    9 9999]
 [  10   11   12   13 9999 9999]
 [  14   15   16   17   18   19]]

And once again Sven beats us all :) My humble attempt follows,

from numpy import arange,array,split
from numpy import concatenate as cat
from numpy import repeat as rep

a = arange(1,20)

i = array([1,3,5,4,6])
j = max(i) - i

s = split(a,i.cumsum())
z = array([cat((t,rep(9999,k))) for t,k in zip(s[:-1],j)])

print z

delivers,

[[   1 9999 9999 9999 9999 9999]
 [   2    3    4 9999 9999 9999]
 [   5    6    7    8    9 9999]
 [  10   11   12   13 9999 9999]
 [  14   15   16   17   18   19]]

继续阅读：arrays numpy python

fastest way to arrange data in python numpy based on list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？