Numpy: is it possible to use numpy and ndarray to replace for a loop in this code snippet?

2023-03-22 01:22 问答作者：

I am looking for a smarter and better solution.

I want to apply different scaling factors to a number field based on the label content. Hopefully the following code can illustrate what I am trying to achieve:

PS = [('A', 'LABEL1', 20),
('B', 'LABEL2', 15),
('C', 'LABEL3', 120),
('D', 'LABEL1', 3),]

FACTOR = [('LABEL1', 0.1), ('LABEL2', 0.5), ('LABEL3', 10)]

d_factor = dict(FACTOR)

for p in PS:
        newp = (p[0], p[1], p[2]*d_factor[p[1]])
        print newp

It is a very trivial operation, but I need to perform it on a dataset of at least one million rows.

So, of course, the faster the better.

The factors will be known in advance and they will be no more than 20 to 30 in numbers.

Is there any matrix or linalg trick we can use?
Can ndarray accepts a text value in a cel开发者_运维百科l?

If you want to mix data types you are going to want structured arrays.

If you are going to want the index of matching values in a lookup array you want searchsorted

Your example goes like this:

>>> import numpy as np
>>> PS = np.array([
    ('A', 'LABEL1', 20),
    ('B', 'LABEL2', 15),
    ('C', 'LABEL3', 120),
    ('D', 'LABEL1', 3),], dtype=('a1,a6,i4'))
>>> FACTOR = np.array([
    ('LABEL1', 0.1), 
    ('LABEL2', 0.5), 
    ('LABEL3', 10)],dtype=('a6,f4'))

Your structured arrays:

>>> PS
array([('A', 'LABEL1', 20), ('B', 'LABEL2', 15), ('C', 'LABEL3', 120),
       ('D', 'LABEL1', 3)], 
      dtype=[('f0', '|S1'), ('f1', '|S6'), ('f2', '<i4')])
>>> FACTOR
array([('LABEL1', 0.10000000149011612), ('LABEL2', 0.5), ('LABEL3', 10.0)], 
      dtype=[('f0', '|S6'), ('f1', '<f4')])

And you can access individual fields like this (or you can give them names; see the docs):

>>> FACTOR['f0']
array(['LABEL1', 'LABEL2', 'LABEL3'], 
      dtype='|S6')

How to perform the lookup of FACTOR on PS (FACTOR must be sorted):

>>> idx = np.searchsorted(FACTOR['f0'], PS['f1'])
>>> idx
array([0, 1, 2, 0])
>>> FACTOR['f1'][idx]
array([  0.1,   0.5,  10. ,   0.1], dtype=float32)

Now simply create a new array and multiply:

>>> newp = PS.copy()
>>> newp['f2'] *= FACTOR['f1'][idx]
>>> newp
array([('A', 'LABEL1', 2), ('B', 'LABEL2', 7), ('C', 'LABEL3', 1200),
       ('D', 'LABEL1', 0)], 
      dtype=[('f0', '|S1'), ('f1', '|S6'), ('f2', '<i4')])

If you compare two numpy arrays, you get the corresponding indexes. You can use those indexes to do collective operations. This probably isn't the fastest modification, but it is simple and clear. If PS needs to have the structure you show, you can use custom dtype and have a Nx3 array.

import numpy as np

col1 = np.array(['a', 'b', 'c', 'd'])
col2 = np.array(['1', '2', '3', '1'])
col3 = np.array([20., 15., 120., 3.])

factors = {'1': 0.1, '2': 0.5, '3': 10, }

for label, fac in  factors.iteritems():
    col3[col2==label] *= fac

print col3

I don't think numpy can help you for that. BTW, it is ndarray, not nparray...

Maybe you could do it with a generator. See http://www.dabeaz.com/generators/index.html

继续阅读：numpy performance python

Numpy: is it possible to use numpy and ndarray to replace for a loop in this code snippet?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？