How to randomize (or permute) a dataframe rowwise and columnwise?

2023-03-14 13:39 问答作者：

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0    1    0    0    1

The d1...d4 column is the rowname, the f1...f5 row is the columnname.

To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.

Is it possible to do the randomization row-wise or column-wise?

I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.

     f1   f2   f3   f4   f5
d1   1    0    0    0    1  
d2   0    1    0    1    1
d3   1    0    0    1    1
d4   0    0    1    1    0

Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:

     f1   f2   f3   f4   f5
d1   0    1    1    1    1  <- two entries are different
d2   0    0    1    0    1  <- four entries are different
d3   1    0    0    0    1  <- two entries are different
d4   0    0    1    0    1  <- two entries are different

PS. Many thanks for the help from Gavin Simpson开发者_高级运维, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.

Given the R data.frame:

Shuffle row-wise:

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0

By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

Shuffle column-wise:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0

This is another way to shuffle the data.frame using package dplyr:

row-wise:

df2 <- slice(df1, sample(1:n()))

df2 <- sample_frac(df1, 1L)

column-wise:

df2 <- select(df1, one_of(sample(names(df1))))

Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.

mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

This gives:

R> out$perm[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    0    1    0    1    0
[3,]    0    0    0    1    1
[4,]    1    0    0    0    1
R> out$perm[[2]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    1    1
[2,]    0    0    0    1    1
[3,]    1    0    0    1    0
[4,]    0    0    1    0    1

To explain the call:

out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.

A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.

Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.

you can also use the randomizeMatrix function in the R package picante

example:

test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
     [,1] [,2] [,3] [,4]
[1,]    1    0    1    0
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "frequency",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    0    0
[3,]    1    0    1    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "richness",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0
>

The option null.model="frequency" maintains column sums and richness maintains row sums. Though mainly used for randomizing species presence absence datasets in community ecology it works well here.

This function has other null model options as well, check out following link for more details (page 36) of the picante documentation

Of course you can sample each row:

sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))

will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P

If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:

df2 = lapply(df1, function(x) { sample(x) })

You can also "sample" the same number of items in your data frame with something like this:

nr<-dim(M)[1]
random_M = M[sample.int(nr),]

Random Samples and Permutations ina dataframe If it is in matrix form convert into data.frame use the sample function from the base package indexes = sample(1:nrow(df1), size=1*nrow(df1)) Random Samples and Permutations

Here is a data.table option using .N with sample like this:

library(data.table)
setDT(df)
df[sample(.N)]
#>    a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0

^{Created on 2023-01-28 with reprex v2.0.2}

Data:

df <- read.table(text = "  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)

继续阅读：permutation random

How to randomize (or permute) a dataframe rowwise and columnwise?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？