How does Remember the Milk's string matching work?
I'm interested in developing a similar solution to RTM's Smart Add Feature.
For those who don't know Remember the Milk here's how it works: Adding tasks is done by means of an input box that accepts strings and parses out different parameters like task name, due date, priority, tags, etc. The parameters are usually preceded by special symbols ( ^, #, &, etc. ). RTM also accepts variations like 'Tennis on Wednesday'.
My basic question to you is how would you design a system that is capable of intelligently discerning different parts of a string. Will I have to look into natural language processing?
Thus far I'm using a simple regex expression that looks for special preceding symbols ( ^, #, &, etc. ) and then parses out the different parts of the string. This gets increasingly difficult with more and more unordered parameters. maybe that stems from my lack of regex expertise.
A similar problem arises when trying to convert different formats of due dates ( '27 May 2008 16:00', '27th May 2008', '16th June 16:00', 'June 16th 12:00', 'today 12:00am', etc) into datetime objects. I'm currently using Python and regular expressions. My method is to basically run through a long list of possible date and time combinations and convert the matching expression with date.strptime. I found this approach to be hard to maintain; lots of false positives, leftover strings etc. You can look at my code here: https://gist.github.com/1233786 It's not pretty, you have been warned.
I'd appreciate any hint into the right direction to approach this topic. Coding a dateparser was really fun but I before I hunt down all the bugs in hundreds of different use cases I thought I check if there's a more elegant design pattern.
P.S.: I would love some code samples to sink my teeth in. Pre开发者_C百科ferably Python :)
I assume they have some grammars for parsing input sentece. Those grammar can express variety of NLP structures, such es entity extraction. For those grammar one can use GATE JAPE(http://gate.ac.uk/sale/tao/splitch8.html#chap:jape) or Gexp(http://code.google.com/p/graph-expression/)
精彩评论