Calculate Mean of a column in R having non numeric values

2023-04-04 00:20 问答作者：

I have a column which contain numeric as well as non-numeric values. I want to find the mean of the numeric values which i can use it to replace the non-numeric values. How can this be done in开发者_高级运维 R?

Say your data frame is named df and the column you want to "fix" is called df$x. You could do the following.

You have to unfactor and then convert to numeric. This will give you NAs for all the character strings that cannot be coalesced to numbers.

nums <- as.numeric(as.character(df$x))

As Richie Cotton pointed out, there is a "more efficient, but harder to remember" way to convert factors to numeric

nums <- as.numeric(levels(df$x))[as.integer(df$x)]

To get the mean, you use mean() but pass na.rm = T

m <- mean(nums, na.rm = T)

Assign the mean to all the NA values.

nums[is.na(nums)] <- m

You could then replace the old data, but I don't recommend it. Instead just add a new column

df$new.x <- nums

This is a function I wrote yesterday to combat the non-numeric types. I have a data.frame with unpredictable type for each column. I want to calculate the means for numeric, and leave everything else untouched.

colMeans2 <- function(x) {
    # This function tries to guess column type. Since all columns come as
    # characters, it first tries to see if x == "TRUE" or "FALSE". If
    # not so, it tries to coerce vector into integer. If that doesn't 
    # work it tries to see if there's a ' \" ' in the vector (meaning a
    # column with character), it uses that as a result. Finally if nothing
    # else passes, it means the column type is numeric, and it calculates
    # the mean of that. The end.

#   browser()

    # try if logical
    if (any(levels(x) == "TRUE" | levels(x) == "FALSE")) return(NA)

    # try if integer
    try.int <- strtoi(x)
    if (all(!is.na(try.int)))  return(try.int[1])

    # try if character
    if (any(grepl("\\\"", x))) return(x[1])

    # what's left is numeric
    mean(as.numeric(as.character(x)), na.rm = TRUE)
    # a possible warning about coerced NAs probably originates in the above line
}

You would use it like so:

apply(X = your.dataframe, MARGIN = 2, FUN = colMeans2)

It sort of depends on what your data looks like.

Does it look like this?

data = list(1, 2, 'new jersey')

Then you could

data.numbers = sapply(data, as.numeric)

and get

c(1, 2, NA)

And you can find the mean with

mean(data.numbers, na.rm=T)

A compact conversion:

  vec <- c(0:10,"a","z")
  vec2 <- (as.numeric(vec))
  vec2[is.na(vec2)] <- mean(vec2[!is.na(vec2)])

as.numeric will print the warning message listed below and convert the non-numeric to NA.

Warning message:
In mean(as.numeric(vec)) : NAs introduced by coercion

Calculate Mean of a column in R having non numeric values

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？