开发者

R plot frequency of strings with specific pattern

Given a data frame with a column that contains strings. I would like to plot the frequency of strings that bear a certain pattern. For example

strings  <- c("abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")
df <- as.data.frame(strings)
df
     strings
1       abcd
2       defd
3    hfjfjcd
4 kgjgcdjrye
5   yryriiir
6  twtettec

I would like to plot the frequency of the stri开发者_如何学Cngs that contain the pattern `"cd" Anyone with a quick solution?


I presume from your question that you meant to have some entries that appear more than once, so I've added one duplicate string:

x <- c("abcd","abcd","defd","hfjfjcd","kgjgcdjrye","yryriiir","twtettecd")

To find only those strings that contain a specific pattern, use grep or grepl:

y <- x[grepl("cd", x)]

To get a table of frequencies, you can use table

table(y)

y
      abcd    hfjfjcd kgjgcdjrye  twtettecd 
         2          1          1          1 

And you can plot it using plot or barplot as follows:

barplot(table(y))

R plot frequency of strings with specific pattern


Others have already mentioned grepl. Here is an implementation with plot.density using grep to get the positions of the matches

R plot frequency of strings with specific pattern

plot( density(0+grepl("cd", strings)) )

If you don't like the extension of the density plot beyond the range there are other methods in the 'logspline' package that allow one to get sharp border at range extremes. Searching RSiteSearch


check "Kernlab" package. You can define a kernel (pattern) which could any kind of string and count them later on.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜