Is there a way to chunk 2 or more repititions of a tag in a tagged sentence using nltk?

2023-03-19 08:22 问答作者：

I'm trying to use the nltk module in python to chunk together any instances where two to five nouns occur in sequence.

This is the code I am using:

parse_pattern  = "Keyword: {< N>{2,5}}"
keyword_parser = nltk.RegexpParser(parse_pattern)
result = keyword_parser.parse(sentence)

I makes sense that this bit should do the trick: Keyword: {< N>{2,5}}

I even found an example in the book Natural Language Processing with Python that uses the above bit completely analogously: NOUNS: {< N.*>{4,}} where the authors explain that that bit of code should chunk 4 or more nouns.

However, I get an error when I run the above code:

ValueError: Illegal chunk pattern: {< N>{2,5}}

Note: I also tried the above using {< N.*>{2,5}} (with the dot star solely because the author of the aforementioned book did) with no luck.

Any help in how to chunk two or more repetitions of a 开发者_运维知识库tag would be highly appreciated.

The ValueError is probably triggered by the space between the opening angle bracket and the N.

parse_pattern = "Keyword: {<N>{2,5}}" rather than
parse_pattern = "Keyword: {< N>{2,5}}"

Also, don't worry about using the syntax with the extra dot star, as this is only necessary if you are trying to match all tags that start with, here, N.

If all fails, you may try the alternative expression which doesn't require the {min, max} syntax for the occurrences range. parse_pattern = "Keyword: {<N><N><N>?<N>?<N>?}"

And if that even fails, maybe try just parse_pattern = "Keyword: {<N>}", this hopefully would get something to work or otherwise maybe help pinpoint what else may be wrong with your setup.

nltk tags nouns with the following tags:

<NN> for a singular noun
<NNP> for a singular proper noun
<NNS> for a plural noun
<NNPS> for a plural proper noun

Thus if you want to catch any of these between two and five times, you'll want the regex:

<NN.*>{2,5}

With your example, that would be:

parse_pattern  = "Keyword: {<NN.*>{2,5}}"
keyword_parser = nltk.RegexpParser(parse_pattern)
result         = keyword_parser.parse(sentence)

Note that sentence must be tagged, e.g.

sentence = [("dog", "NN"), ("David", "NNP"), ("cats", "NNS")]

look for the code of regex.py package, that the method of tag_pattern2re_pattern(), which functionality convert tag_pattern to correct regular expression. Whereas the constant parameter CHUNK_TAG_PATTERN that is immutable, which starts with some special character and ends with special character, such like '('、' '、'<'、')'、'>'、'>'. So the tag pattern CHUNK:{<V.*><TO><V.*>} is correct, but the tag pattern CHUNK:{<V>.*<TO><V.*>{1,}} is incorrect

继续阅读：chunking nltk python

Is there a way to chunk 2 or more repititions of a tag in a tagged sentence using nltk?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？