How to find and list a specific value from each column
I was given a dataset by my professor and one of my questions is, "Find the number of missing values, 99999, in each column and list them." How would I do this in python? I have multiple columns all with numerical data. The missing values in开发者_开发问答 the dataset are denoted by '99999' instead of NA like usual.
I don't have much experience in python and have tried many things to no avail
Use a lambda function to find all occurrences of 99999; then use sum()
to get the total number of occurrences per column
# import pandas package
import pandas as pd
# load dataset with pandas, for example if you have a csv:
df = pd.read_csv("YOUR_FILEPATH_HERE")
# print out the number of occurrences of 99999 in each column
print(df.apply(lambda x: (x == 99999).sum()))
A non pandas answer:
NA = 99999
data = [
[ 1, NA, 3 ],
[ NA, NA, 6 ],
]
NAs = [0] * len(data[0]) # create an array of counters; 1 for each column
for row in data:
for x,value in enumerate(row):
if value == NA:
NAs[x] += 1
print( NAs )
# Replace the missing value code '99999' with the default missing value code NaN
df = df.replace(99999, np.nan)
# Identify the missing values in each column of the DataFrame (where NaN is the default missing value code)
missing_values = df.isnull()
Remember to import numpy as np.
精彩评论