How can I count the number of times a value occurs in a column of a dataframe?
Is there a simple way of identifying the number of times a value is in a vector or column of dataframe? I essentially want the numerical values of a histogram but I do not know how to access it.
开发者_运维知识库# sample vector
a <- c(1,2,1,1,1,3,1,2,3,3)
#hist
hist(a)
Thank you.
UPDATE:
On Dirk's suggestion I am using hist. Is there a better way than than specifying the range as 1.9, 2.9 etc when I know that all my values are integers?
hist(a, breaks=c(1,1.9,2.9,3.9,4.9,5.9,6.9,7.9,8.9,9.9), plot=FALSE)$counts
Use table
function.
Try this:
R> a <- c(1,2,1,1,1,3,1,2,3,3)
R> b <- hist(a, plot=FALSE)
R> str(b)
List of 7
$ breaks : num [1:5] 1 1.5 2 2.5 3
$ counts : int [1:4] 5 2 0 3
$ intensities: num [1:4] 1 0.4 0 0.6
$ density : num [1:4] 1 0.4 0 0.6
$ mids : num [1:4] 1.25 1.75 2.25 2.75
$ xname : chr "a"
$ equidist : logi TRUE
- attr(*, "class")= chr "histogram"
R>
R is object-oriented and most methods give you meaningful results back. Use them.
If you want to use hist
you don't need to specify the breaks as you did, just use the seq
function
br <- seq(0.9, 9.9, 1)
num <- hist(a, br, plot=F)$counts
Also, if you're looking for a specific value you can also use which
.
For instance:
num <- length(which(a == 1))
In addition to the performance difference between hist
and table
in the case of many unique values that Dirk and mbq already pointed out, I would also like to mention an other difference in functionality.
hist$counts
will also give you zero counts for the bins that do not have any cases. This can be very valuable in the case where you want to be confident about the number of bins (bars on a barplot for example) that will end up in a following plot.
table
on the other hand will only give you counts for existing values.
You might also want to check the right
option of hist
that controls whether your breaks (intervals) will be right closed or not.
精彩评论