R plot frequency of strings with specific pattern
Given a data frame with a column that contains strings. I would like to plot the frequency of strings that bear a certain pattern. For example
strings <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
strings
1 abcd
2 defd
3 hfjfjcd
4 kgjgcdjrye
5 yryriiir
6 twtettec
I would like to plot the frequency of the stri开发者_如何学Cngs that contain the pattern `"cd" Anyone with a quick solution?
I presume from your question that you meant to have some entries that appear more than once, so I've added one duplicate string:
x <- c("abcd","abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
To find only those strings that contain a specific pattern, use grep
or grepl
:
y <- x[grepl("cd", x)]
To get a table of frequencies, you can use table
table(y)
y
abcd hfjfjcd kgjgcdjrye twtettecd
2 1 1 1
And you can plot it using plot
or barplot
as follows:
barplot(table(y))
Others have already mentioned grepl. Here is an implementation with plot.density using grep to get the positions of the matches
plot( density(0+grepl("cd", strings)) )
If you don't like the extension of the density plot beyond the range there are other methods in the 'logspline' package that allow one to get sharp border at range extremes. Searching RSiteSearch
check "Kernlab" package. You can define a kernel (pattern) which could any kind of string and count them later on.
精彩评论