Apostrophes Converted to Correct Text?
Goal: I need to be able to convert apostrophes to properly formed words. - at least for the most common words with apostrophes. To do this ideally I'd want a list of words and their implied conterparts (i.e. "don't" and "do not").
Issue: I'm creating a search algorithm based on natural language processing, but when users create content (or search) using an apostrophe, it causes issues for us. Mostly because if we were to simply remove the apostrophe we would have (don't -> dont) (doesn't -> d开发者_如何学Coesnt), which officially is not an english word, and can't be translated by the NLP system.
The ideal solution is simply a one to one mapping of what these items should be converted to, but I'm unaware of such a list.
Please let me know if you know of one, and where I might be able to find it.
thx
This looks like a pretty good list: http://www.textfixer.com/resources/english-contractions-list.php
Depends on how good you want to make your system. Is it going to understand that "gonna" is "going to" and "gotta" is ... well, that's a tough one. It could mean "got to" ("have to", "must"), or "got a" ("have a").
Oh, the things we learn when we try to teach our computers to communicate.
These words are called "contractions" and you can find a list on the web, e.g. http://en.wikipedia.org/wiki/Contraction_(grammar)
精彩评论