Find sentences with similar relative meaning from a list of sentences against an example one

2023-03-02 14:44 问答作者：

I want to be able to find sentence开发者_JAVA百科s with the same meaning. I have a query sentence, and a long list of millions of other sentences. Sentences are words, or a special type of word called a symbol which is just a type of word symbolizing some object being talked about.

For example, my query sentence is:

Example: add (x) to (y) giving (z)

There may be a list of sentences already existing in my database such as: 1. the sum of (x) and (y) is (z) 2. (x) plus (y) equals (z) 3. (x) multiplied by (y) does not equal (z) 4. (z) is the sum of (x) and (y)

The example should match the sentences in my database 1, 2, 4 but not 3. Also there should be some weight for the sentence matching.

Its not just math sentences, its any sentence which can be compared to any other sentence based upon the meaning of the words. I need some way to have a comparison between a sentence and many other sentences to find the ones with the closes relative meaning. I.e. mapping between sentences based upon their meaning.

Thanks! (the tag is language-design as I couldn't create any new tag)

First off: what you're trying to solve is a very hard problem. Depending on what's in your dataset, it may be AI-complete.

You'll need your program to know or learn that add, plus and sum refer to the same concept, while multiplies is a different concept. You may be able to do this by measuring distance between the words' synsets in WordNet/FrameNet, though your distance calculation will have to be quite refined if you don't want to find multiplies. Otherwise, you may want to manually establish some word-concept mappings (such as {'add' : 'addition', 'plus' : 'addition', 'sum' : 'addition', 'times' : 'multiplication'}).

If you want full sentence semantics, you will in addition have to parse the sentences and derive the meaning from the parse trees/dependency graphs. The Stanford parser is a popular choice for parsing.

You can also find inspiration for this problem in Question Answering research. There, a common approach is to parse sentences, then store fragments of the parse tree in an index and search for them by common search engines techniques (e.g. tf-idf, as implemented in Lucene). That will also give you a score for each sentence.

You will need to stem the words in your sentences down to a common synonym, and then compare those stems and use the ratio of stem matches in a sentence (5 out of 10 words) to compare against some threshold that the sentence is a match. For example all sentences with a word match of over 80% (or what ever percentage you deem acurate). At least that is one way to do it.

Write a function which creates some kinda hash, or "expression" from a sentence, which can be easy compared with other sentences' hashes.

Cca:
1. "the sum of (x) and (y) is (z)" => x + y = z
4. "(z) is the sum of (x) and (y)" => z = x + y

Some tips for the transformation: omit "the" words, convert double-word terms to a single word "sum of" => "sumof", find operator word and replace "and" with it.

Not that easy ^^ You should use a stopword filter first, to get non-information-bearing words out of it. Here are some good ones

Then you wanna handle synonyms. Thats actually a really complex theme, cause you need some kind of word sense disambiguation to do it. And most state of the art methods are just a little bit better then the easiest solution. That would be, that you take the most used meaning of a word. That you can do with WordNet. You can get synsets for a word, where all synonyms are in it. Then you can generalize that word (its called a hyperonym) and take the most used meaning and replace the search term with it.

Just to say it, handling synonyms is pretty hard in NLP. If you just wanna handle different wordforms like add and adding for example, you could use a stemmer, but no stemmer would help you to get from add to sum (wsd is the only way there)

And then you have different word orderings in your sentences, which shouldnt be ignored aswell, if you want exact answers (x+y=z is different from x+z=y). So you need word dependencies aswell, so you can see which words depend on each other. The Stanford Parser is actually the best for that task if you wanna use english.

Perhaps you should just get nouns and verbs out of a sentence and make all the preprocessing on them and ask for the dependencies in your search index. A dependency would look like

x (sum, y)
y (sum, x)
sum (x, y)

which you could use for ur search

So you need to tokenize, generalize, get dependencies, filter unimportant words to get your result. And if you wanna do it in german, you need a word decompounder aswell.

继续阅读：google-natural-language

Find sentences with similar relative meaning from a list of sentences against an example one

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？