开发者

R - Select rows for random sample of column values?

How can I select all of the rows for a random sample of column values?

I have a dataframe that looks like this:

tag  weight

R007     10
R007     11
R007      9
J102     11
J102      9
J102     13
J102     10
M942      3
M054      9
M054     12  
V671     12
V671     13
V671      9
V671     12
Z990     10
Z990     11

That you can replicate using...

weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L, 
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054开发者_如何转开发", 
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L, 
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L, 
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA, 
-16L))

I need to create a dataframe containing all of the rows from the above dataframe for two randomly sampled tags. Let's say tags R007and M942 get selected at random, my new dataframe needs to look like this:

tag  weight

R007     10
R007     11
R007      9
M942      3

How do I do this?

I know I can create a list of two random tags like this:

library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample

Resulting in...

   tag count
4 R007     3
3 M942     1

But I just don't know how to use that to subset my original dataframe.


is this what you want?

subset(weights_df, tag%in%sample(levels(tag),2))


If your data.frame is named dfrm, then this will select 100 random tags

dfrm[ sample(NROW(dfrm), 100), "tag" ]   # possibly with repeats

If, on the other hand, you want a dataframe with the same columns (possibly with repeats):

samp <- dfrm[ sample(NROW(dfrm), 100),  ]  # leave the col name entry blank to get all

A third possibility... you want 100 distinct tags at random, but not with the probability at all weighted to the frequency:

samp.tags <- unique(dfrm$tag)[ sample(length(unique(dfrm$tag)), 100]

Edit: With to revised question; one of these:

 subset(dfrm, tag %in% c("R007", "M942") )

Or:

dfrm[dfrm$tag %in% c("R007", "M942"), ]

Or:

dfrm[grep("R007|M942", dfrm$tag), ]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜