开发者

Select several subsets by taking different row interval and appy function to all subsets

How can I select n number of subsets from a data frame by taking every nth row for subset 1 then nth +1 row for subset 2 then nth+3 for subset3 until nth=n

I have used

subset<-data[seq(nth,length,n),]

But this gives one subset then I have to keep changing nth from 1...n to get different subsets.e.g using a small data(106 rows x 742 columns) set to get 10 subsets of every 10th row

 subset1<-data[seq(1,106,10),]
 subset2<-data[seq(2,106,10),]
 subset3<-data[seq(3,106,10),]

Is there any way to do this better?

From going through the FAQ I have tried using loops like

sub<开发者_运维技巧-function(data,nth,length,n){
         sub<-data[seq(nth,length,n),]
         for(n in 1:(sub)){
         sub2<-sub[nth,]+1,sub3<-sub[nth,]+2,sub4<-sub[nth,]+3) }
      su<-(sub,sub2, sub3,sub4)
     return(su)
    } 
sub(data=gag11p,n=1,length=106,10)

This returns 3 data list with only the last variable in the data frame,I am not sure where I went wrong, also how can I just get the name of the subset instead of a data frame as I want to apply a PLS calibration function to the subsets created

Please forgive and correct any mistakes since I am now learning programing and R.


A one liner using lapply borrowing the function idea from @Chase.

foo2 = function(data, nSubsets, nSkip){
   lapply(1:nSubsets, function(n) data[seq(n, NROW(data), by = nSkip),])
} 

foo2(mtcars, 5, 15)


I suggest you store all of these different subsets into a single list object. I'm not sure I 100% followed your code above, but I think this does what you want:

FOO <- function(data, nSubsets, nSkip){
  outList <- vector("list", length = nSubsets)
  totRow <- nrow(data)

  for (i in seq_len(nSubsets)) {
    rowsToGrab <- seq(i, totRow, nSkip)
    outList[[i]] <- data[rowsToGrab ,] 
  }
  return(outList)
}

What's happening?

  1. We first preallocate a list object that corresponds to the number of subsets you want to make
  2. Define the total number of rows so you don't have to pass it in as a parameter to the function
  3. Use a for loop similar to what you used to determine which rows to grab, and then define that to the list defined above
  4. Return the list object.

Here's an example using the mtcars data. Note the dataset only has 32 rows, so the function automatically handles subscripts that are out of bounds and doesn't throw a warning / error:

FOO(mtcars, 5, 15)

[[1]]
                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 #row 1
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4 #row 16
Maserati Bora       15.0   8  301 335 3.54 3.570 14.60  0  1    5    8 #row 31

[[2]]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 #row 2
Chrysler Imperial 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4 #row 17
Volvo 142E        21.4   4  121 109 4.11 2.780 18.60  1  1    4    2 #row 32

[[3]]
            mpg cyl  disp hp drat   wt  qsec vs am gear carb
Datsun 710 22.8   4 108.0 93 3.85 2.32 18.61  1  1    4    1 #row 3
Fiat 128   32.4   4  78.7 66 4.08 2.20 19.47  1  1    4    1 #row 18

[[4]]
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 #row 4
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 #row 19

[[5]]
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 #row 5
Toyota Corolla    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 #row 20
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜