How to make words into a category. (NLP)

2022-12-10 08:06 问答作者：

I love to eat chicken.
Today I went running, swimming and played basketball.

My objective is to return FOOD and SPORTS 开发者_开发问答just by analyzing these two sentences. How can you do that?

I am familiar with NLP and Wordnet. But is there something more high-level/practical/modern technology??

Is there anything that automatically categorizes words for you, into "levels"?

More importantly, what is the technical term for this process?

That problem is difficult to solve procedurally, but much progress has been made in the area lately.

Most natural language processing begins with a grammar (which may or may not be context free.) Its a set of construction rules stating how more general things are made out of more specific ones.

example context free grammar:

Sentence ::= NounPhrase VerbPhrase
NounPhrase ::= ["The"] [Adjective] Noun
Adjective ::= "big" | "small" | "red" | "green"
Noun ::= "cat" | "man" | "house"
VerbPhrase ::= "fell over"

This is obviously oversimplified, but the task of making a complete grammar to define all of english is enormous, and most real systems only define some subset of it applicable to a problem domain.

Once a grammar has been defined, (or learned using complicated algorithms known only to the likes of Google) a string, called an "exemplar" is parsed according to the grammar. which tags each word with the parts of speech. a grammar that is very complex would not just have the parts of speech you learned in school, but categories such as "Websites" "Names of old people" and "ingredients".

These categories can be laboriously built into the grammar by humans or inferred using things like Analogical Modeling or Support Vector Machines. In each, things like "chicken", "football", "BBQ", and "cricket" would be defined as points in a very high dimensional space, along with millions of other points, and then the clustering algorithms, would define groups just based on the positions of those points relative to each-other. then one might try to infer names for the groups from example text.

link text This Google search lists several techniques used in NLP, and you could learn a whole lot from them.

EDIT to just solve this problem, one might crawl the web for sentences of the form "_ is a _" to build up a database of item-category relationships. then you parse a string like above, and look for words that are known items in the database

The question you ask is a whole area of research called topical text categorization. A great overview of techniques is "Machine learning in automated text categorization" in ACM Computing Surveys, by Fabrizio Sebastiani.. One of the simplest techniques (though not necessarily the best performing) is to have numerous (hundreds) examples of sentences in each category, and then train a Naive Bayesian classifier on those sample sentences. NLTK contains a Naive Bayesian classifier in the module nltk.classify.naivebayes.

Google Sets does some of this, and there is some discussion that mentions supersets. However, I have not really seen any technical details in there, just ideas and discussion.

Maybe this could at least help your research...

You might take a look at WordNet Domains resource by people from FBK. It is an extension of WordNet which is designed to be used for text categorization and word sense disambiguation. It allows varying degrees of granularity.

http://wndomains.fbk.eu/

One of the possible ways to apply it to your task might be to get NP-chunks out of your sentences, get their head words and from them get the categories from WordNet domains.

Tenqyu solved it using Python and Machine Learning.

Have a dataset of text
Apply Tf-idf vectorization . The weight of a term that occurs in a document is simply proportional
to the term frequency. (The Luhn Assumption 1957) The specificity of a term can be quantified as an inverse function of the number of documents in which it occurs.
A Vector Space Model
Multinomial Native Bayes Classification

The process in more detail is here: https://hackernoon.com/how-to-better-classify-coachella-with-machine-learning-part-1-dc84c53d1a9c

继续阅读：nltk python text

How to make words into a category. (NLP)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？