How to reference columns of a data.frame within a data.frame?
I have a data.frame called series_to_plot.df which I created by combi开发者_如何学Pythonning a number of other data.frames together (shown below). I now want to pull out just the .mm column from each of these, so I can plot them. So I want to pull out the 3rd column of each data.frame (e.g. p3c3.mm, p3c4.mm etc...), but I can't see how to do this for all data.frames in the object without looping through the name. Is this possible?
I can pull out just one set: e.g. series_to_plot.df[[3]] and another by
series_to_plot.df[[10]] (so it is just a list of vectors..) and I can reference directly with series_to_plot.df$p3c3.mm, but is there a command to get a vector containing all mm's from each data.frame? I was expecting an index something like this to work: series_to_plot.df[,3[3]] but it returns Error in [.data.frame
(series_to_plot.df, , 3[3]) : undefined columns selected
series_to_plot.df
p3c3.rd p3c3.day p3c3.mm p3c3.sd p3c3.n p3c3.noo p3c3.no_NAs
1 2010-01-04 0 0.1702531 0.04003364 7 1 0
2 2010-01-06 2 0.1790594 0.04696674 7 1 0
3 2010-01-09 5 0.1720404 0.03801756 8 0 0
p3c4.rd p3c4.day p3c4.mm p3c4.sd p3c4.n p3c4.noo p3c4.no_NAs
1 2010-01-04 0 0.1076581 0.006542157 6 2 0
2 2010-01-06 2 0.1393447 0.066758781 7 1 0
3 2010-01-09 5 0.2056846 0.047722862 7 1 0
p3c5.rd p3c5.day p3c5.mm p3c5.sd p3c5.n p3c5.noo p3c5.no_NAs
1 2010-01-04 0 0.07987147 0.006508766 7 1 0
2 2010-01-06 2 0.11496167 0.046478767 8 0 0
3 2010-01-09 5 0.40326471 0.210217097 7 1 0
To get all columns with specified name you could do:
names_with_mm <- grep("mm$", names(series_to_plot.df), value=TRUE)
series_to_plot.df[, names_with_mm]
But if your base data.frame
's all have the same structure then you can rbind
them, something like:
series_to_plot.df <- rbind(
cbind(name="p3c3", p3c3),
cbind(name="p3c4", p3c4),
cbind(name="p3c5", p3c5)
)
Then mm
values are in one column and its easier to plot.
To add to the other answers, I don't think it is a good idea to have useful information encoded in variable names. Much better to rearrange your data so that all useful information is in the value of some variable. I don't know enough about your data set to suggest the right format, but it might be something like
p c rd day date mm sd ...
3 3 2010-10-04 ...
Once you have done this the answer to your question becomes the simple df$mm
.
If you are getting the data in a less useful form from an external source, you can rearrange it in a more useful form like the above within R using the reshape
function or functions from the reshape
package.
The R Language Definition has some good info on indexing (sec 3.4.1), which is pretty helpful.
You can then pull the names matching a sequence with the grep() command. Then string it all together like this:
dataWithMM <- series_to_plot.df[,grep("[P]", names(series_to_plot.df))]
to deconstruct it a little, this gets the number of the columns that match the "mm" pattern:
namesThatMatch <- grep("[mm]", names(series_to_plot.df)
Then we use that list to call the columns we want:
dataWithMM <- series_to_plot.df[, namesThatMatch ]
精彩评论