开发者

Speeding up converting rtf to plain-text

I am having to change a mass ammount of text saved in a database in RTF format in to plain text. I am using the method described in this MSDN article however I i think i found a snag (I don't think it is in my code but the .NET framework itself).

I have the following function

    //convert RTF text to plain text
    public static string RtfTextToPlainText(string FormatObject)
    {
        System.Windows.Forms.RichTextBox rtfBox = new System.Windows.Forms.RichTextBox();
        rtfBox.Rtf = FormatObject;
        FormatObject = rtfBox.Text; //This is line 494 for later reference for the stack traces.
        rtfBox.Dispose();

        return FormatObject;
    }

It should be totally self contained and not block on anything. The project I am doing has several million records that need processing so I am breaking up the work in batches and using tasks to do parallel processing. It was still going fairly slow so I broke in to the code and found this.

Speeding up converting rtf to plain-text

Here is the call stack for the waiting task

[In a sleep, wait, or join] 
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x242 bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes  
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes   
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes    
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494   C#

And here is the call stack of thread 816

[Managed to Native Transition]  
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes  
System.Windows.Forms.dll!System.Windows.Forms.Control.WmWindowPosChanged(ref System.Windows.Forms.Message m) + 0x39 bytes   
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x51b bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes  
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes    
[Native to Managed Transition]  
[Managed to Native Transition]  
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DefWndProc(ref System.Windows.Forms.Message m) + 0x9e bytes  
System.Windows.Forms.dll!System.Windows.Forms.Control.WmCreate(ref System.Windows.Forms.Message m) + 0x1c bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.WndProc(ref System.Windows.Forms.Message m) + 0x50b bytes 
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.WndProc(ref System.Windows.Forms.Message m) + 0x5c bytes  
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.DebuggableCallback(System.IntPtr hWnd, int msg, System.IntPtr wparam, System.IntPtr lparam) + 0x15e bytes    
[Native to Managed Transition]  
[Managed to Native Transition]  
System.Windows.Forms.dll!System.Windows.Forms.NativeWindow.CreateHandle(System.Windows.Forms.CreateParams cp) + 0x44c bytes 
System.Windows.Forms.dll!System.Windows.Forms.Control.CreateHandle() + 0x2b2 bytes  
System.Windows.Forms.dll!System.Windows.Forms.TextBoxBase.CreateHandle() + 0x54 bytes   
System.Windows.Forms.dll!System.Windows.Forms.RichTextBox.Rtf.set(string value) + 0x68 bytes    
>CvtCore.dll!CvtCore.StandardFunctions.Str.RtfTextToPlainText(object Expression) Line 494   C#

Why is task 2 blocking on task 4 on line 494, shouldn't they both be totally independent of each other?


NOTE

I grabbed these stack traces and screen shots while in release mode, I can not seem to hit pause at the correct time to get the same thing to happen in debug mode. Also could this be the cause of my slowness? The profiler says my program spends 83.2% of its time in `System.Windows.Forms.RichTextBox.set_Rtf(string) (which is a sub function called by line 494)

Any suggestions on how to speed this process of striping out the formatting of the rtf would be greatly appreciated.


P.S.

I am currently rewriting it so each thread will have a text box that does not get disposed of instead of creating a new one every time the function is called, I expect that to speed it up a lot, I will update with details after I do开发者_运维技巧 it.


UPDATE

I solved my own problem (see answer below) but here is how i started the tasks

//create start consumer threads
for (int i = 0; i < ThreadsPreProducer; i++)
{
    //create worked and thread
    WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule);
    Task WorkerTask = new Task(NewWorkerObject.DoWork);
    WorkerTasks.Add(WorkerTask);
    WorkerTask.Start();
}


//create/start producer thread
ProducerObject NewProducerObject = new ProducerObject(colSource, SourceQuery, ConnectionString, PreProcessor, UpdateModule, RowNameIndex);
Task ProducerTask = new Task(NewProducerObject.DoWork);
WorkerTasks.Add(ProducerTask);
ProducerTask.Start();


//block while producer runs
ProducerTask.Wait();

//create post producer threads
for (int i = 0; i < ThreadsPostProducer; i++)
{
    //create worked and thread
    WorkerObject NewWorkerObject = new WorkerObject(colSource, FormatObjectEvent, UpdateModule);
    Task WorkerTask = new Task(NewWorkerObject.DoWork);
    WorkerTasks.Add(WorkerTask);
    WorkerTask.Start();
}

//block until all tasks are done
Task.WaitAll(WorkerTasks.ToArray());

It is using a producer/consumer model with, in my case, 1 producer and 4 consumers (2 start at the beginning and 2 start after the producer is done to speed up the work after the system resources are freed from the producer).


Changing the function to

static ThreadLocal<RichTextBox> rtfBox = new ThreadLocal<RichTextBox>(() => new RichTextBox());
//convert RTF text to plain text
public static string RtfTextToPlainText(string FormatObject )
{
     rtfBox.Value.Rtf = FormatObject;
     FormatObject = rtfBox.Value.Text;
     rtfBox.Value.Clear();

     return FormatObject;
}

Changed my run time from several minutes to several seconds.

I do not dispose of the objects as they will be used for the entire life of the program.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜