开发者

Dynamic text-pattern detection algorithm? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I was wondering if such algorithm exists. I have a bunch of text documents and would like to find a pattern among all these documents, if a pattern exists. Please note im NOT trying to classify the documents all i want to do is find a 开发者_运维知识库pattern if it exists among some documents. Thanks!


The question as it stands now is kinda vague.. you kinda need to know what you are looking for in order to be able to find it.
Some ideas that may be of use -

  1. Get n-gram counts for each document separately for n = 1,2,3,4 and then compare the frequencies of each ngram across the documents. This should help you find commonly occuring phrases across all documents.
  2. Get a part of speech tagger to get convert all the docs into a stream of POS tags and then do the same as 1
  3. Use a PCFG software such as the Stanford Parser to get parse trees for all the sentences across all the documents, and then try to figure out how similar the distribution of sentence structures are for different documents.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜