开发者

Splitting a paragraph

I want to split the paragraph using the "." operator. But I don't want to split it for some cases. Like where "." come with word like "Dr.", or "Mrs.", and "Miss." or some few other words.

I n开发者_开发技巧eed some logic whether it is in C# or in SQL Server.


I read the question as "How do I split the paragraph into it's component sentences?", if that's what you meant, here's how I would approach the problem:

  1. Build a "white list" of acceptable period usage inside sentences
  2. Split your paragraph on "." (call these possible sentences)
  3. Loop through your possible sentences, checking the ending characters against your white list of acceptable period usage inside sentences
  4. If it matches, combine that possible sentence with the next, and check it again

Not knowing the scope of your true problem set, I can't say whether this approach is actually feasible or not.

Here is a (possibly) related question, if you're looking into a more robust English language parser, but that question was for Java.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜