I\'m want to tokenize a text, but not separating only with whitespaces. There some things like proper names that I want to set only one token (eg.: \"Renato Dinhani Concei开发者_开发技巧ção\"). An
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current form. For help clari
I have a list of strings that I need to check against an English dictionary. However I don\'t want to start checking every piece of gibberish in the list. First, I want to check if the string could be
ok i make this one but i have 83000 words in mysql database when i execute this script it will take too much time and some time it not runs. i think this script match every title in mysql database wat
The goal of this application is produce a system that can generate quizzes automatically. The user should be able to supply any word or phrase they like (e.g. \"Sachin Tendulkar\"); the system will th
I\'ve finished gathering my data I plan to use for my corpus, but I\'开发者_JAVA百科m a bit confused about whether I should normalize the text. I plan to tag & chunk the corpus in the future. Some
Can someone introduce me on how to write a simple web (HTML/XML) interface for a simple sentence alignment task?
Is there an open dictionary database where I can get at minimum a table of the sort: word | part of speech ?
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow.
I need to extract posts and tweets from Facebok and Twitter into our database for analysis. My problem is the system can process on the English sentences (phrases) only. So how can I remove non-Englis