开发者

How to remove 'None' from an Appended Multidimensional Array using numpy

I need to take a csv file and import this data into a multi-dimensional array in python, but I am not sure how t开发者_JAVA技巧o strip the 'None' values out of the array after I have appended my data to the empty array.

I first created a structure like this:

storecoeffs = numpy.empty((5,11), dtype='object')

This returns an 5 row by 11 column array populated by 'None'.

Next, I opened my csv file and converted it to an array:

coeffsarray = list(csv.reader(open("file.csv")))

coeffsarray = numpy.array(coeffsarray, dtype='object')

Then, I appended the two arrays:

newmatrix = numpy.append(storecoeffs, coeffsarray, axis=1)

The result is an array populated by 'None' values followed by the data that I want (first two rows shown to give you an idea as to the nature of my data):

array([[None, None, None, None, None, None, None, None, None, None, None,
    workers, constant, hhsize, inc1, inc2, inc3, inc4, age1, age2,
    age3, age4],[None, None, None, None, None, None, None, None, None, None, None,
    w0, 7.334, -1.406, 2.823, 2.025, 0.5145, 0, -4.936, -5.054, -2.8, 0],,...]], dtype=object)

How do I remove those 'None' objects from each row so what I am left with is the 5 x11 multidimensional array with my data?


@Gnibbler's answer is technically correct, but there's no reason to create the initial storecoeffs array in the first place. Just load in your values and then create an array from them. As @Mermoz noted, though, your use case looks simple enough for numpy.loadtxt().

Beyond that, why are you using an object array?? It's probably not what you want... Right now, you're storing the numerical values as strings, not floats!

You have essentially two ways to handle your data in numpy. If you want easy access to named columns, use a structured array (or a record array). If you want to have a "normal" multidimensional array, just use an array of floats, ints, etc. Object arrays have a specific purpose, but it's probably not what you're doing.

For example: To just load in the data as a normal 2D numpy array (assuming all your data can be represented easily as a float):

import numpy as np
# Note that this ignores your column names, and attempts to 
# convert all values to a float...
data = np.loadtxt('input_filename.txt', delimiter=',', skiprows=1)

# Access the first column 
workers = data[:,0]

To load your data in as a structured array, you might do something like this:

import numpy as np
infile = file('input_filename.txt')

# Read in the names of the columns from the first row...
names = infile.next().strip().split()

# Make a dtype from these names...
dtype = {'names':names, 'formats':len(names)*[np.float]}

# Read the data in...
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# Note that data is now effectively 1-dimensional. To access a column,
# index it by name
workers = data['workers']

# Note that this is now one-dimensional... You can't treat it like a 2D array
data[1:10, 3:5] # <-- Raises an error!

data[1:10][['inc1', 'inc2']] # <-- Effectively the same thing, but works..

If you have non-numerical values in your data and want to handle them as strings, you'll need to use a structured array, specify which fields you want to be strings, and set a max length for the strings in the field.

From your sample data, it looks like the first column, "workers" is a non-numerical value that you might want to store as a string and all the rest look like floats. In that case, you'd do something like this:

import numpy as np
infile = file('input_filename.txt')
names = infile.next().strip().split()

# Create the dtype... The 'S10' indicates a string field with a length of 10
dtype = {'names':names, 'formats':['S10'] + (len(names) - 1)*[np.float]}
data = np.loadtxt(infile, dtype=dtype, delimiter=',')

# The "workers" field is now a string array
print data['workers']

# Compare this to the other fields
print data['constant']

If there are cases where you really need the flexibility of the csv module (e.g. text fields with commas), you can use it to read the data, and then convert it to a structured array with the appropriate dtype.

Hope that makes things a bit clearer...


Start with an empty array?

storecoeffs = numpy.empty((5,0), dtype='object')


Why are you allocating an entire array of Nones and appending to that? Is coeffsarray not the array you want?

Edit

Oh. Use numpy.reshape.

import numpy
coeffsarray = numpy.reshape( coeffsarray, ( 5, 11 ) )


why not simply using numpy.loadtxt():

newmatrix = numpy.loadtxt("file.csv", dtype='object') 

should do the job, if i understood well you question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜