开发者

How to detect and mask missing data on imported csv files in Python?

I am very new to Python and I have been trying to detect missing data in lists created from data in imported csv files so that I can plot the series using matplotlib without getting an error.

I show you what I have:

import numpy as np
# import matplotlib.pyplot as plt
import csv
from pylab import *

res = csv.reader(open('cvs_file_with_data.csv'), delimiter=',')
res.next() # do not read header

ColOneData = []
ColTwoData = []
ColThreeData = []

for col in res:
    ColOneData.append(col[0])
    ColTwoData.append(col[1])
    ColThreeData.append(col[2])

print ColOneData # I got here the following ['1', '2', '3', '4', '5'] 

print ColTwoData # I got here the following ['1', '2', '', '', '5']

print ColThreeData # I got here the following ['', '', '3', '4', '']

ColTwoData_M = np.ma.masked_where(ColTwoData == '', ColTwoData) # This does not work

I need to mask the em开发者_StackOverflow中文版pty values e.g. '' so that I can plot the series without errors. Any suggestion to solve this problem?

Regards...


What do you mean by mask? Remove? If so, try the following:

masked_data = [point for point in data if point != '']

Edit:

I'm not used to numpy, but maybe this is what you are searching for:

>>> data = numpy.array(['0', '', '1', '', '2'])
>>> numpy.ma.masked_where(data == '', data)
masked_array(data = [0 -- 1 -- 2],
             mask = [False True False True False],
       fill_value = N/A)


Jose, if you wish to plot column1 against column2 and not have the empty items cause errors, you will have to remove the empty items in column2 along with the corresponding items in column1. A function like the following should do the trick.

def remove_empty(col1, col2):
    # make copies so our modifications don't clobber the original lists
    col1 = list(col1) 
    col2 = list(col2)
    i = 0
    while i < len(col1):
        # if either the item in col1 or col2 is empty remove both of them
        if col1[i] == '' or col2[i] == '':
            del col1[i]
            del col2[i]
        # otherwise, increment the index
        else: i+=1
    return col1, col2


If what you want to do is add a filler value to the empty nodes you could do something like this:

def defaultIfEmpty(a):
    if a == '':
        return '0'

    return a

x = ['0', '', '2', '3', '']
map (defaultIfEmpty,x)

result: x = ['0', '0', '2', '3', '0']

If that's the result your looking for you could map(defaultIfEmpty,ColOneData) then ColTwoData, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜