R generate bi- and trigrams from column
I have a column containing a word in each row:
word
-----
asdf
wer
asdf
Is there a way to get the most frequent bi- and trigrams over all rows? For instance开发者_运维知识库 for bigrams:
aa: 10%
ab: 9%
.....
I have no experience with this particular sort of problem, but a little Google work turned up the tau
package for "N-Gram Based Text Categorization". And using the textcnt
function on your sample looked like this:
x <- c('asdf','wer','asdf')
textcnt(x,3)
and seems to return the sort of information you're looking for.
精彩评论