开发者

R generate bi- and trigrams from column

I have a column containing a word in each row:

 word
 -----
 asdf
 wer
 asdf

Is there a way to get the most frequent bi- and trigrams over all rows? For instance开发者_运维知识库 for bigrams:

aa: 10%
ab: 9%
.....


I have no experience with this particular sort of problem, but a little Google work turned up the tau package for "N-Gram Based Text Categorization". And using the textcnt function on your sample looked like this:

x <- c('asdf','wer','asdf')
textcnt(x,3)

and seems to return the sort of information you're looking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜