Should a Chunker find the head of a phrase?
My application requires that I point the head of a phrase (noum or verb). I have this kind of info in my Portuguese corpus:
Me pron-pers *B-NP
pergunto v-fin B-VP sempre adv *B-ADVP quem pron-indp *B-NP podia v-fin B-VP ter v-inf I-VP sido v-pcp I-VP aquele pron-det B-NP jovem adj I-NP alemão n *I-NP . . OThe syntax is similar to CONLL 2000, but the * 开发者_JAVA百科marks the head of the phrase. My question is: should a Chunker support head? Do you know any other corpus to train a Chunker that also includes head, or it is a particularity of mine?
-- edit --
I tried training the classifier and got good results: F1 score was 0.94 without head mark and 0.93 with it. I think it is OK. The problem is that the OpenNLP chunker API does not support this mark and gets confused while creating the spans. I changed the OpenNLP code to handle it and I was wondering if it is a good patch, but since it is not common I should not send the patch.
I've never seen a chunker that supports head-finding, so I can't help you with a corpus. What you might do, if you already have a chunker, is formulate a bunch of rules that designate the head after the chunker has found it, or train a classifier to do so. You can train it on your corpus and apply it on chunker output.
精彩评论