Working with Data.frames in R (Using SAS code to describe what I want)r

2022-12-08 02:12 问答作者：

I've been mostly working in SAS of late, but not wanting to lose what familiarity with R I have, I'd like to replicate something basic I've done. You'll forgive me if my SAS code isn't perfect, I'm doing this from memory since I don't have SAS at home.

In SAS I have a dataset that roughly is like the following example (. is equivalent of NA in SAS)

If the dataset above was work.foo then I could do something like the following.

/* create work.bar from dataset work.foo */
data work.bar;
set work.foo;

/* generate a third variable and add it to work.bar */
if a = 0 and b ge 1 then c = 1;
if a = 0 and b = 0  then c = 2;
if a = 1 and b ge 1 then c = 3;
if a = 1 and b = 0  then c = 4;
run;

and I'd get something like

And I could then proc sort by C and then perform various operations using C to create 4 subgroups. For example I could get the means of each group with

proc means noprint data =work.bar; 
by c;
var a b;
output out = work.means mean(a b) = a b;
run;

and I'd get a data of variables by groups called work.means something like:

I think I may also get a . row, but I don't care about that for my purposes.

Now in R. I have the same data set that's been read in properly, but I have no idea how to add a variable to the end (like CC) or how to do an operation on a subgroup (like the b开发者_开发百科y cc command in proc means). Also, I should note that my variables aren't named in any sort of order, but according to what they represent.

I figure if somebody can show me how to do the above, I can generalize it to what I need to do.

Assume your data set is a two-column dataframe called work.foo with variables a and b. Then the following code is one way to do it in R:

work.bar <- work.foo
work.bar$c <- with( (a==0 & b>=1) + 2*(a==0 & b==0) + 3*(a==1 & b>=1) + 
               4*(a==1 & b==0), data=work.foo)
work.mean <- by(work.bar[,1:2], work.bar$c, mean)

An alternative is to use ddply() from the plyr package - you wouldn't even have to create a group variable, necessarily (although that's awfully convenient).

ddply(work.foo, c("a", "b"), function(x) c(mean(x$a, na.rm = TRUE), mean(x$b, na.rm = TRUE))

Of course, if you had the grouping variable, you'd just replace c("a", "b") with "c".

The main advantage in my mind is that plyr functions will return whatever kind of object you like - ddply takes a data frame and gives you one back, dlply would return a list, etc. by() and its *apply brethren usually just give you a list. I think.

继续阅读：dataframe r sas

Working with Data.frames in R (Using SAS code to describe what I want)r

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？