Sentence normalization
I'm running a site that allows users to make comments, and most of them are highly uneducated and cannot use proper punctuation, capitalization, and etc.
I'm looking for some sort of function that will take their garbage text, and make it look nice.
For example:
- before: this is a test. i LIKE PIE
- after: This is a test. I like pie.
- before: CAPS LOCK IS CRUISE CONTROL F开发者_JS百科OR COOL.
- after: Caps lock is cruise control for cool.
sentenceNormalizer is the only thing I found, but it's too simplistic. It makes everything that doesn't follow a . ! ? be lowercase, regardless whether it's "I", a person's name that was actually capitalized on purpose, or anything else.
Recognizing person's name in full caps would sure be a hard task. OPRAH NEEDS SOME CRUISE CONTROL... cruise or Cruise?
Depending on the terms of use for your site, you may not own the comments, which prevents you from editing them.
Everybody needs a full caps sometimes to MAKE a point, without having to learn another system to insert (italic, bold) styles.
Having their comment edited under their feet probably will infuriate some people, which might turn them away from your site. Do you want that?
Frequent users will catch a few tricks (proper spelling, spacing, not too many caps...).
It's an interesting project, which might involve fully understanding the MEANING of the comments, but I'm not sure it's a good idea.
精彩评论