R - Select rows for random sample of column values?
How can I select all of the rows for a random sample of column values?
I have a dataframe that looks like this:
tag weight
R007 10
R007 11
R007 9
J102 11
J102 9
J102 13
J102 10
M942 3
M054 9
M054 12
V671 12
V671 13
V671 9
V671 12
Z990 10
Z990 11
That you can replicate using...
weights_df <- structure(list(tag = structure(c(4L, 4L, 4L, 1L, 1L, 1L, 1L,
3L, 2L, 2L, 5L, 5L, 5L, 5L, 6L, 6L), .Label = c("J102", "M054开发者_如何转开发",
"M942", "R007", "V671", "Z990"), class = "factor"), value = c(10L,
11L, 9L, 11L, 9L, 13L, 10L, 3L, 9L, 12L, 12L, 14L, 5L, 12L, 11L,
15L)), .Names = c("tag", "value"), class = "data.frame", row.names = c(NA,
-16L))
I need to create a dataframe containing all of the rows from the above dataframe for two randomly sampled tags. Let's say tags R007and M942 get selected at random, my new dataframe needs to look like this:
tag weight
R007 10
R007 11
R007 9
M942 3
How do I do this?
I know I can create a list of two random tags like this:
library(plyr)
tags <- ddply(weights_df, .(tag), summarise, count = length(tag))
set.seed(5464)
tag_sample <- tags[sample(nrow(tags),2),]
tag_sample
Resulting in...
tag count
4 R007 3
3 M942 1
But I just don't know how to use that to subset my original dataframe.
is this what you want?
subset(weights_df, tag%in%sample(levels(tag),2))
If your data.frame is named dfrm
, then this will select 100 random tag
s
dfrm[ sample(NROW(dfrm), 100), "tag" ] # possibly with repeats
If, on the other hand, you want a dataframe with the same columns (possibly with repeats):
samp <- dfrm[ sample(NROW(dfrm), 100), ] # leave the col name entry blank to get all
A third possibility... you want 100 distinct tags at random, but not with the probability at all weighted to the frequency:
samp.tags <- unique(dfrm$tag)[ sample(length(unique(dfrm$tag)), 100]
Edit: With to revised question; one of these:
subset(dfrm, tag %in% c("R007", "M942") )
Or:
dfrm[dfrm$tag %in% c("R007", "M942"), ]
Or:
dfrm[grep("R007|M942", dfrm$tag), ]
精彩评论