开发者

How to identify ideas and concepts in a given text

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:

Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?

It'd be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:

Please, never send me a photo of Mr Jones.

Does anyone know where to start with this? Is it even possible?

I've looked into things like nltk,开发者_JAVA百科 but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.

Thanks!


The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.

I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.


NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.

A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.

Where to start? Good question. I don't know of any tutorials on the topic (and I presume you tried the Google option) but I'd imagine that iTunes U would have a course on the topic. If not I can post a link to a course I've done that mentions the subject and wasn't completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01


The problem that u tackle is very challenging.

I would start by first identifying the entities in the text (problem referred as Named Entity Recognition, google it), and then a I would try to identify concepts.

If want to roughly identify what is the text about, I suggest that you start by using WordNet and according to the words and their places in the hierarchy to identify the concepts involved. If you want to produce a system which show real intelligence than you should start researching about resources such as CYC (OpenCYC) which will allow you to convert the sentences into FOL sentences.

This hardcore AI, approach to solving your problem. For simple chat bot, it would be easier to rely on simple statistical methods.

good luck

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜