How to make a small engine like Wolfram|Alpha?
Lets say I have three models/tables: operating_systems
, words
, and programming_languages
:
# operating_systems
name:string created_by:string family:string
Windows Microsoft MS-DOS
Mac OS X Apple UNIX
Linux Linus Torvalds UNIX
UNIX AT&T UNIX
# words
word:string defenitions:string
window (serialized hash of defenitions)
hello (serialized hash of defenitions)
UNIX (serialized hash of defenitions)
# programming_languages
name:string created_by:string example_code:text
C++ Bjarne Stroustrup #include <iostream> etc...
HelloWorld Jeff Skeet h
AnotherOne Jon Atwood imports 'SORU开发者_如何学PythonLEZ.cs' etc...
When a user searches hello
, the system shows the defenitions of 'hello'. This is relatively easy to implement. However, when a user searches UNIX
, the engine must choose: word
or operating_system
. Also, when a user searches windows
(small letter 'w'), the engine chooses word
, but should also show Assuming 'windows' is a word. Use as an <a href="etc..">operating system</a> instead
.
Can anyone point me in the right direction with parsing and choosing the topic of the search query? Thanks.
Note: it doesn't need to be able to perform calculations as WA can do.
Have a new index table called terms
that contains a tokenised version of each valid term. That way, you only have to search one table.
# terms
Id Name Type Priority
1 window word false
2 Windows operating_system true
Then you can see how close a match the users search term is. I.e. "Windows" would be a 100% match with 2
- so assume that, but a close match to 1
also, so suggest that as an alternative. You've have to write your own rules engine that decided how close a word matches (i.e. what gets assumed with "windows" vs "Windows"?) The Priority
field could be the final decider if the rules engine can't decide, and could in theory be driven by user activity so it learns what users are more likely referring to.
And what about to make a cache in form of a database table where all the keywords would be.
The search query would be something like this:
SELECT * FROM keywords WHERE keyword = '<YourKeyWord>' /* mysql */
the keywords table would contain some kind of references to your modules.
The advantage of this approarch is of course fast searching.
You may use two queries in order to simulate the behaviour you ask for:
- Exact match (no problem in mysql)
- Case insensitive search
Wolfram Alpha is far more complex than your example... I'm not certain of its inner workings (I have done very little reading on it), but I believe it is a very large and complex automated inference system. They're rather trivial to implement (Prolog is basically a general purpose one you can put whatever data you need into), but they're very hard to make useful.
精彩评论