Splitting a paragraph
I want to split the paragraph using the "." operator. But I don't want to split it for some cases. Like where "." come with word like "Dr.", or "Mrs.", and "Miss." or some few other words.
I n开发者_开发技巧eed some logic whether it is in C# or in SQL Server.
I read the question as "How do I split the paragraph into it's component sentences?", if that's what you meant, here's how I would approach the problem:
- Build a "white list" of acceptable period usage inside sentences
- Split your paragraph on "." (call these possible sentences)
- Loop through your possible sentences, checking the ending characters against your white list of acceptable period usage inside sentences
- If it matches, combine that possible sentence with the next, and check it again
Not knowing the scope of your true problem set, I can't say whether this approach is actually feasible or not.
Here is a (possibly) related question, if you're looking into a more robust English language parser, but that question was for Java.
精彩评论