开发者

How to find and list a specific value from each column

I was given a dataset by my professor and one of my questions is, "Find the number of missing values, 99999, in each column and list them." How would I do this in python? I have multiple columns all with numerical data. The missing values in开发者_开发问答 the dataset are denoted by '99999' instead of NA like usual.

I don't have much experience in python and have tried many things to no avail


Use a lambda function to find all occurrences of 99999; then use sum() to get the total number of occurrences per column

# import pandas package
import pandas as pd

# load dataset with pandas, for example if you have a csv:
df = pd.read_csv("YOUR_FILEPATH_HERE")

# print out the number of occurrences of 99999 in each column
print(df.apply(lambda x: (x == 99999).sum()))


A non pandas answer:

NA = 99999
data = [
  [  1, NA, 3 ],
  [ NA, NA, 6 ],
]

NAs = [0] * len(data[0])  # create an array of counters; 1 for each column

for row in data:
  for x,value in enumerate(row):
    if value == NA:
      NAs[x] += 1


print( NAs )


# Replace the missing value code '99999' with the default missing value code NaN
df = df.replace(99999, np.nan)

# Identify the missing values in each column of the DataFrame (where NaN is the default missing value code)
missing_values = df.isnull()

Remember to import numpy as np.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜