Python/Numpy: Convert list of bools to unsigned int

2023-01-23 10:59 问答作者：

What is the fastest (or most "Pythonic") way to convert
```
x = [False, False, True, True]
```
into 12? (If there is such a way.)
What if x were instead a numpy.array of bools? Is there a special command for that?

I have a large m-by-n array of booleans, where each n-element row represents a single low-dimensional hash of a high-dimensional feature vector. (In the example above, n = 4.) I would like to know the answer in order to compress my data as much as possible. Thank you.

Edit: Thank you for the responses! Using the following test code,

t = 0
for iter in range(500):
    B = scipy.signbit(scipy.randn(1000,20))
    for b in B:
        t0 = time.clock()
        # test code here
        t1 = time.clock()
        t += (t1-t0)
print t

...here were the runtimes on my Thinkpad laptop:

My answer: 4.26 sec
Sven Marnach 1: 7.88
Emil H: 8.51
Sven Marnach 2: 8.72
delnan: 10.14
liori: 53.49

Of course, I welcome any independent tests that may confirm or refute my data!

Edit: In my answer below, changing int(j) to simply j still works, but runs six times as slow! Then perhaps the other answers would become faster if the bool was casted using int. But I'm too lazy to test everything again.

Edit: liori posted results of independent tests here.

Taking various ideas from various other answers, here's another way to do it:

sum(1<<i for i, b in enumerate(x) if b)

It is quite fast in my tests - right up with the numpy method for large number of bits even though it overflows like crazy. I used liori's testing module for testing. Steve's method, with the change I suggested, is just barely faster. However, if a lot of these sorts of conversions need to be done at a time (and with not too many bits), I'm betting that numpy will be faster.

Most Pythonic might be this:

sum(2**i*b for i, b in enumerate(x))

It's hard to tell if it is also the fastest.

In numpy I would use

numpy.sum(2**numpy.arange(len(x))*x)

but this won't be faster for small arrays x, and it won't work for big arrays x since machine size integers are used instead of Pythons arbitrary precision ints.

reduce(lambda a,b:2*a+b, reversed(x))

You could get rid of reversed() if you had least significant bit at the end of array. This works with numpy.array too, and doesn't need enumerate(). From my tests seem to be faster too: no need to use exponentiation.

My initial attempt, just for reference:

def bool2int(x):
    y = 0
    for i,j in enumerate(x):
        if j: y += int(j)<<i
    return y

An elegant, pythonic, always-working way is this:

def powers(x):
    """yield powers of x, starting from x**0 forever"""
    power = 1
    while True:
        yield power
        power *= x

def bools_to_int(bools):
    # in Python 2, use itertools.izip!
    return sum(int(place) * place_weight for place_weight, place in 
               zip(powers(2), bools))

Note that you can get rid of powers (by enumerate and squaring in the comprehension, as other answers do) - but maybe it's clearer this way.

Something like this?

>>> x = [False, False, True, True]
>>> sum([int(y[1])*2**y[0] for y in enumerate(x)])
12

You can convert a numpy array to a regular list using a list() cast.

>>> a = numpy.array([1,2,3,4])
>>> a
array([1, 2, 3, 4])
>>> list(a)
[1, 2, 3, 4]

If you have a matrix, you probably want to do it like this:

#precompute powers of two
vals = 2.**np.arange(20)

B = ....
compressed = np.dot(B, vals) # matrix multiplication.

np.dot should be faster than any loop in Python. Much faster.

I was trying ipython %timeit and it seems that doing the following is faster:

y = 0
for i,j in enumerate(x):
    if j: y += 1<<i

In addition, if your boolean vector is a numpy.ndarray, converting it to python array x.tolist() and running the same seems to work faster in this case. It's all marginal, but consistent as well as, at these speeds, marginals add up well.

numpy has the packbits function for this. It also supports operations along axes:

In [3]: B = scipy.signbit(scipy.randn(1000,8)).astype("i1")

In [3]: B[0]
Out[3]: array([0, 1, 0, 0, 0, 1, 0, 0], dtype=int8)

In [4]: np.packbits(B[0])
Out[4]: array([68], dtype=uint8)

In [5]: %timeit np.packbits(B, axis=1)
10000 loops, best of 3: 37 µs per loop

it works for int8 sizes for larger sizes you have to shift and or

In [8]: x # multiple of 8
Out[8]: array([1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1], dtype=int8)

In [9]: r = np.packbits(x).astype(np.int32); r
Out[9]: array([171, 129], dtype=uint8)

In [10]: r[0] << 8 | r[1] 
Out[10]: 33237

In [11]: sum(1<<i for i, b in enumerate(x[::-1]) if b)
Out[11]: 33237

if x is no multiple of 8 you have to pad in zeros

If you're willing to add another extension to the mix, I added pack() and unpack() to the development branch of gmpy. My tests show it may be 2x or 3x faster.

>>> import gmpy2
>>> gmpy2.pack([0,0,1,1],1)
mpz(12)
>>> gmpy2.unpack(12,1)
[mpz(0), mpz(0), mpz(1), mpz(1)]

Disclaimer: The development version is called gmpy2 and can co-exist with the stable version. It is still in alpha phase but will hopefully become beta in a few weeks. You need to have both GMP and MPFR libraries installed. The source is available at http://code.google.com/p/gmpy/source/checkout

继续阅读：list numpy python scipy

Python/Numpy: Convert list of bools to unsigned int

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？