Creating new data frames from a larger data frame using a list

2023-04-11 04:29 问答作者：

I have a data frame that contains multiple data points for a large number of samples. Here is a shortened example with 3 samples each with 3 data points:

Assay       Genotype      Sample 
CCT6-002        G         sam1   
CCT6-007        G         sam1
CCT6-013        C         sam1 
CCT6-002        T         sam2   
CCT6-007        A         sam2
CCT6-013        T         sam2 
CCT6-002        T         sam3   
CCT6-007        A         sam3
CCT6-013        T         sam3

To do my downstream analysis I would like to subset the data for each sample into an individual data frame. Since this is something that I will be doing with many data sets with changing sample names, Id like to figure out an automated way doing this so I don't need to edit my script each time with the list of new samples.

I would like my output to be a data frame for each sample with the same name as the sample. So with the example data above, the result should be 3 data frames with the names sam1, sam2, sam3. Each data frame would have 3 lines with the Assay and gen开发者_如何学JAVAotype data.

I am sorry if this is a very basic question but Im a newbie and have been working on this for quite a while. Thanks!

The split command is the easiest way to turn this into a list of data.frame objects split on sample.

myList <- split(mydf, mydf$Sample)

The items can be accessed in the list by numeric indexing (i.e. myList[[1]]) or by the name of the unique item in the variable Sample (i.e. myList$sam1).

The numeric indexing is obvioustly handy when you're going through a sequence but you can still use the name for that as well.

 #get names of the unique items in sample
 nam <- unique(mydf$Sample)
 #as a test look at the first few rows of each of my data.frames
 for( i in nam) print( head(myList[[i]]) )
 #another way to use access to the data.frame is the with() statement
 for( i in nam) with(myList[[i]], print( Assay[1:2] )

That's not necessarily the most efficient R syntax but hopefully it gets you farther along in actually using your list of data.frame objects.

Now, that gives you what you asked for but here's some advice on what you asked for. Don't do it. Just learn to properly acccess your data.frame object. You could just as easily not make the list up and go through all of the unique instances of Sample in your code... including saving them out as separate files. The advantage of that is that you can do lots of nifty vectorized commands on your intact data.frame across Sample that are much harder on the list. Just stick with you nice big data.frame.

Here are a couple of simple examples. Look at what I did above for just getting the first few lines of each of the separate data.frame objects in the list. Here's something similar just run on the big data.frame.

lapply( unique(mydf$Sample), function(x) print(head( mydf[ mydf$Sample == x,] )) )

How about something more meaningful? Let's say I want a count of each individual Genotype separated by Sample.

table( mydf$Genotype, mydf$Sample)

That's much easier than what you'd have to do with the big list. There's lots of functions like that you'll want to sue on your intact data.frame like tapply and aggregate. Even if you wanted to do something that seems like it might be easier with the data.frame broken up, like sorting within each Sample level, it's easier with the data.frame.

mydf[ order(mydf$Sample, mydf$Assay), ]

That will order by Sample and then by Assay nested within Sample.

When I started R I thought that splitting up data.frame objects was the way to go and used it a lot. Since I've learned R better I never ever do that. I don't have a single bit of R code written after the few weeks with R that ever splits up the data.frame into a list. I'm not saying you should never do it. I'm just saying that it's relatively rare that you need it or that it's the best idea. You might want to post a query on here about your end goal and get some advice on that.

继续阅读：dataframe

Creating new data frames from a larger data frame using a list

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？