开发者

Should a Chunker find the head of a phrase?

My application requires that I point the head of a phrase (noum or verb). I have this kind of info in my Portuguese corpus:

Me pron-pers *B-NP

pergunto v-fin B-VP

sempre adv *B-ADVP

quem pron-indp *B-NP

podia v-fin B-VP

ter v-inf I-VP

sido v-pcp I-VP

aquele pron-det B-NP

jovem adj I-NP

alemão n *I-NP

. . O

The syntax is similar to CONLL 2000, but the * 开发者_JAVA百科marks the head of the phrase. My question is: should a Chunker support head? Do you know any other corpus to train a Chunker that also includes head, or it is a particularity of mine?

-- edit --

I tried training the classifier and got good results: F1 score was 0.94 without head mark and 0.93 with it. I think it is OK. The problem is that the OpenNLP chunker API does not support this mark and gets confused while creating the spans. I changed the OpenNLP code to handle it and I was wondering if it is a good patch, but since it is not common I should not send the patch.


I've never seen a chunker that supports head-finding, so I can't help you with a corpus. What you might do, if you already have a chunker, is formulate a bunch of rules that designate the head after the chunker has found it, or train a classifier to do so. You can train it on your corpus and apply it on chunker output.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜