开发者

Does anyone know how many sentences there are in the original Penn Treebank?

I can't seem to fi开发者_如何学Gond that in the documentation anywhere


The Penn Treebank has 4.5 million English words that are used for P.O.S tagging, and about half of that is used for skeletal parsing.

Check out page 327 of this document http://acl.ldc.upenn.edu/J/J93/J93-2004.pdf. It is a little outdated (2004) but I can't think of any new words that English speakers have introduced since then.


In total its roughly around 40,000 sentences, taken from the Wall Street Journal.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜