Subsetting a data frame in a function using another data frame as parameter

2023-02-04 22:41 问答作者：

I would like to submit a data frame to a function and use it to subset another data frame.

This is the basic data frame:

foo <- data.frame(var1= c(1, 1, 1, 2, 2, 3), var2=c('A',开发者_开发百科 'A', 'B', 'B', 'C', 'C'))

I use the following function to find out the frequencies of var2 for specified values of var1.

foobar <- function(x, y, z){
  a <- subset(x, (x$var1 == y))
  b <- subset(a, (a$var2 == z))
  n=nrow(b)
  return(n)
}

Examples:

foobar(foo, 1, "A") # returns 2
foobar(foo, 1, "B") # returns 1
foobar(foo, 3, "C") # returns 1

This works. But now I want to submit a data frame of values to foobar. Instead of the above examples, I would like to submit df to foobar and get the same results as above (2, 1, 1)

df <- data.frame(var1=c(1, 1, 3), var2=c("A", "B", "C"))

When I change foobar to accept two arguments like foobar(foo, df) and use y[, c(var1)] and y[, c(var2)] instead of the two parameters x and y it still doesn't work. Which way is there to do this?

edit1: last paragraph clarified

edit2: var1 type corrected

Try this:

library(plyr)

match_df <- function(x, match) {
  vars <- names(match)

  # Create unique id for each row
  x_id <- id(match[vars])
  match_id <- id(x[vars])

  # Match identifiers and return subsetted data frame
  x[match(x_id, match_id, nomatch = 0), ]
}


match_df(foo, df)
#   var1 var2
# 1    1    A
# 3    1    B
# 5    2    C

Your function foobar is expecting three arguments, and you only supplied two arguments to it with foobar(foo, df). You can use apply to get what you want:

apply(df, 1, function(x) foobar(foo, x[1], x[2]))

And in use:

> apply(df, 1, function(x) foobar(foo, x[1], x[2]))
[1] 2 1 1

To respond to your edit:

I'm not entirely sure what y[, c(var1)] means, but here's an attempt at trying to figure out what you are trying to do.

What I think you were trying to do was: foobar(foo, y = df[, "var1"], z = df[, "var2"]).

First, note that the use of c() is not needed here and you can reference the columns you want by placing the name of the column in quotes OR reference the column by number (as I did above). Secondly, df[, "var1"] returns all of the rows for the column names var1 which has a length of three:

> length(df[, "var1"])
[1] 3

The function you defined is not set up to deal with vectors of length greater than 1. That is why we need to iterate through each row of your dataframe to grab a single value, process it, and then go to the next row in the data.frame. That is what the apply function does. It is equivalent to saying something along the lines of for (i in 1: length(nrow(df)) but is a more idiomatic way of handling such issues.

Finally, is there a reason you generated var1 as a factor? It probably makes more sense to treate these as numeric in my opinion. Compare:

> str(df)
'data.frame':   3 obs. of  2 variables:
 $ var1: Factor w/ 2 levels "1","3": 1 1 2
 $ var2: Factor w/ 3 levels "A","B","C": 1 2 3

Versus

> df2 <- data.frame(var1=c(1,1,3), var2=c("A", "B", "C"))
> str(df2)
'data.frame':   3 obs. of  2 variables:
 $ var1: num  1 1 3
 $ var2: Factor w/ 3 levels "A","B","C": 1 2 3

In summary - apply is the function you are after here. You may want to spend some time thinking about whether your data should be numeric or a factor, but apply is still what you want.

foobar2 <- function(x, df) {
  .dofun <- function(y, z){
    a <- subset(x, x$var1==y)
    b <- subset(a, a$var2==z)
    n <- nrow(b)
    return (n)
  }
  ans <- mapply(.dofun, as.character(df$var1), as.character(df$var2))
  names(ans) <- NULL
  return(ans)
}

继续阅读：dataframe function r

Subsetting a data frame in a function using another data frame as parameter

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？