开发者

Split Long Text into meaningful sentences with certain length using C#

i would like to split a text to a dot followed by a whitespace or a dot followed by new line (\n) at certain length.

e.g if I have Long text with total 3456 char. I want to spli开发者_如何学编程t this text into three diff. text with 1000 or closest no. of chars but each text should end with full meaningful sentence.

Reason I want to do this is, I am using API which takes only 1000 or less char for data conversion but i have some text which is longer than 1000 char so I want to split into multiple text so I do not have any text more than 1000 char and each text is ended at full sentece. e.g text to a dot followed by a whitespace or a dot followed by new line (\n)

I'm working with c# .Net

Thanks in Advance.


Something like this. Obviously, replace "someText" with your data, and set the shardLength to 1000 for your example. This solution gives an error if there is a sentence larger than the block size.

It currently handles newlines by effectively ignoring them- it only splits on "."

This means that sentences that end in ".\n" will be split after the ".", and the "\n" will be at the start of the next sentence.

The advantage here is that if you pass this to your API, you should be able to concatenate the results and retain the newlines in the appropriate places (assuming the API handles newlines).

     using System.Text.RegularExpressions;
     public static void BlockSplitter()
        {
            String someText = @"This is some text.
The quick brown fox jumps over the lazy dog. Testing 1 2 3.
Sentence with no fullstop";
            String[] sentences;

            string delimiters = @"(?<=\.)";

            sentences = Regex.Split(someText,delimiters);

            String shard = String.Empty;
            int shardLength = 45;

            foreach (String sentence in sentences)
            {
                if (sentence.Length > shardLength) 
                {
                    //Raise an exception as the sentence 
                }
                if ((shard.Length + sentence.Length) <= shardLength)
                {
                    shard += sentence;
                }
                else
                {
                    Console.WriteLine(shard);
                    shard = sentence;
                }
            }
            Console.WriteLine(shard);
        }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜