Does anyone know how many sentences there are in the original Penn Treebank?
I can't seem to fi开发者_如何学Gond that in the documentation anywhere
The Penn Treebank has 4.5 million English words that are used for P.O.S tagging, and about half of that is used for skeletal parsing.
Check out page 327 of this document http://acl.ldc.upenn.edu/J/J93/J93-2004.pdf. It is a little outdated (2004) but I can't think of any new words that English speakers have introduced since then.
In total its roughly around 40,000
sentences, taken from the Wall Street Journal.
精彩评论