开发者

Is this the correct definition of a "corpus"? [closed]

Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 11 years ago.

开发者_Go百科 Improve this question

I have a huge string of raw text that is about 200,000 words long. It's a book.

I want to use these words to analyze the word relationships, so that I can apply those relationships to other applications.

Is this called a "corpus"?


A corpus, in linguistics, is any coherent body of real-life(*) text or speech being studied. So yes, a book is a corpus. The fact that it's in one string doesn't matter, as long as you don't randomly shuffle the characters.

(*) As opposed to a bunch of made up phrases being shown to test subjects to measure their responses, as is commonly done in psycholinguistics.


Yes. http://en.wikipedia.org/wiki/Text_corpus Specifically, because it's uses for statistics.


Usually "corpus" is used to refer to a structured collection, but linguists would know what you're talking about.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜