How to dump a boolean matrix in numpy?

2023-02-01 09:33 问答作者：

I have a graph represented as a numpy boolean array (G.adj.dtype == bool). This is homework in writing my own graph library, so I can't use networkx. I want to dump it to a file so that I can fiddle with it, but for the life of me I can't work out how to make numpy dump it in a recoverable fashion.

I've tried G.adj.tofile, which wrote the graph correctly (ish) as one long line of True/False. But fromfile barfs on re开发者_运维技巧ading this, giving a 1x1 array, and loadtxt raises a ValueError: invalid literal for int. np.savetxt works but saves the matrix as a list of 0/1 floats, and loadtxt(..., dtype=bool) fails with the same ValueError.

Finally, I've tried networkx.from_numpy_matrix with networkx.write_dot, but that gave each edge [weight=True] in the dot source, which broke networkx.read_dot.

To save:

numpy.savetxt('arr.txt', G.adj, fmt='%s')

To recover:

G.adj = numpy.genfromtxt('arr.txt', dtype=bool)

HTH!

This is my test case:

m = numpy.random(100,100) > 0.5

space efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') creates a 54 kB file.

numpy.savetxt('arr.txt', obj, fmt='%d') creates a much smaller file (20 kB).

cPickle.dump(obj, open('arr.dump', 'w')), which creates a 40kB file,

time efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') 45 ms

numpy.savetxt('arr.txt', obj, fmt='%d') 10 ms

cPickle.dump(obj, open('arr.dump', 'w')), 2.3 ms

conclusion

use savetxt with text format (%s) if human readability is needed, use savetxt with numeric format (%d) if space consideration are an issue and use cPickle if time is an issue.

The easiest way to save your array including metadata (dtype, dimensions) is to use numpy.save() and numpy.load():

a = array([[False,  True, False],
           [ True, False,  True],
           [False,  True, False],
           [ True, False,  True],
           [False,  True, False]], dtype=bool)
numpy.save("data.npy", a)
numpy.load("data.npy")
# array([[False,  True, False],
#        [ True, False,  True],
#        [False,  True, False],
#        [ True, False,  True],
#        [False,  True, False]], dtype=bool)

a.tofile() and numpy.fromfile() would work as well, but don't save any metadata. You need to pass dtype=bool to fromfile() and will get a one-dimensional array that must be reshape()d to its original shape.

I know that question is quite old, but I want to add Python 3 benchmarks. It is a bit different than previous one.

Firstly I load a lot of data to memory, convert it to int8 numpy array with only 0 and 1 as possible values and then dump it to HDD using two approaches.

from timer import Timer
import numpy
import pickle

# Load data part of code is omitted.

prime = int(sys.argv[1])

np_table = numpy.array(check_table, dtype=numpy.int8)
filename = "%d.dump" % prime

with Timer() as t:
  pickle.dump(np_table, open("dumps/pickle_" + filename, 'wb'))

print('pickle took %.03f sec.' % (t.interval))

with Timer() as t:
  numpy.savetxt("dumps/np_" + filename, np_table, fmt='%d')

print('savetxt took %.03f sec.' % (t.interval))

Time measuring

It took 50.700 sec to load data number 11
pickle took 0.010 sec.
savetxt took 1.930 sec.

It took 1297.970 sec to load data number 29
pickle took 0.070 sec.
savetxt took 242.590 sec.

It took 1583.380 sec to load data number 31
pickle took 0.090 sec.
savetxt took 334.740 sec.

It took 3855.840 sec to load data number 41
pickle took 0.610 sec.
savetxt took 1367.840 sec.

It took 4457.170 sec to load data number 43
pickle took 0.780 sec.
savetxt took 1654.050 sec.

It took 5792.480 sec to load data number 47
pickle took 1.160 sec.
savetxt took 2393.680 sec.

It took 8101.020 sec to load data number 53
pickle took 1.980 sec.
savetxt took 4397.080 sec.

Size measuring

630K np_11.dump
 79M np_29.dump
110M np_31.dump
442M np_41.dump
561M np_43.dump
875M np_47.dump
1,6G np_53.dump

315K pickle_11.dump
 40M pickle_29.dump
 55M pickle_31.dump
221M pickle_41.dump
281M pickle_43.dump
438M pickle_47.dump
798M pickle_53.dump

So Python 3 pickle version is much faster than numpy.savetxt and is using about 2 times less HDD volume.

继续阅读：matrix numpy python

How to dump a boolean matrix in numpy?

space efficiency

time efficiency

conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

space efficiency

time efficiency

conclusion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？