When and Why Should I Use TStringBuilder?
I converted my program from Delphi 4 to Delphi 2009 a year ago, mainly to make the jump to Unicode, but also to gain the benefits of all those years of Delphi improvements.
My code, of course, is therefore all legacy code. It uses short strings that have now conveniently all become long Unicode strings, and I've changed all the old ANSI functions to the new equivalent.
But with Delphi 2009, they introduced the TStringBuilder class, presumably modelled after the StringBuilder class of .NET.
My program does a lot of string handling and manipulation, and can load hundreds of megabytes of large strings into memory at once to work with.
I don't know a lot about Delphi's implementation of TStringBuilder, but I heard that some of its operations are faster than using the default string operations.
My question is whether or not it is worthwhile for me to go through the effort and convert my standard strings to use the TStringBuilder class. What would I gain and lose from doing that?
Thank you for your answers and leading me to my conclusion, which is not to bother unless .NET compatibility is required.
On hi开发者_运维技巧s blog on Delphi 2009 String Performance, Jolyon Smith states:
But it looks to me as if TStringBuilder is there primarily as a .NET compatibility fixture, rather than to provide any real benefit to developers of Win32 applications, with the possible exception of developers wishing or needing to single-source a Win32/.NET codebase where string handling performance isn’t a concern.
To the best of my knowledge TStringBuilder was introduced just for some parity with .NET and Java, it seems to be more of a tick the box type feature than any major advance.
Consensus seems to be that TStringBuilder is faster in some operations but slower in others.
Your program sounds like an interesting one to do a before/after TStringBuilder comparison with but I wouldn't do it other than as an academic exercise.
Basically, I use these idioms for building strings. The most important differences are:
- TStringBuilder.Create and Append pattern which adds new characters to the TStringBuilder instance.
- TStringList.Create and Add pattern which adds new lines the to the Text of the TStringList instance.
- The Format function to assemble strings based on format patterns.
- Simple concatenation of string types for expressions with 3 or fewer values.
For complex build patterns, the first make my code a lot cleaner, the second only if I add lines and often includes many of Format
calls.
The third makes my code cleaner when format patterns are important.
I use the last one only when the expression is very simple.
A few more differences between the first two idioms:
TStringBuilder
has many overloads forAppend
, and also has AppendLine (with only two overloads) if you want to add lines likeTStringList.Add
canTStringBuilder
reallocates the underlying buffer with an over capacity scheme, which means that with large buffers and frequent appends, it can be a lot faster thanTStringList
- To get the
TStringBuilder
content, you have to call the ToString method which can slow things down.
So: speed is not the most important matter to choose your string appending idiom. Readable code is.
I tried to improve an old routine that was parsing a text file (1.5GB). The routine was pretty dumb, and it was building a string like this: s:= s+ buff[i];
So, I thought that TStringBuilder will add significant speed improvements. It turned out that it was actually 114% slower.
So, I built mu own StringBuilder which is 184.82 times (yes 184!!!!!!) faster than the classic s:= s+ chr (experiment on a 4MB string) and even faster than TStringBuilder.
Tests:
Classic s:= s + c
Time: 8502 ms
procedure TfrmTester.btnClassicClick(Sender: TObject);
VAR
s: string;
FileBody: string;
c: Cardinal;
i: Integer;
begin
FileBody:= ReadFile(File4MB);
c:= GetTickCount;
for i:= 1 to Length(FileBody) DO
s:= s+ FileBody[i];
Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms'); // 8502 ms
end;
Prebuffered
Time:
BuffSize= 10000; // 10k buffer = 406ms
BuffSize= 100000; // 100k buffer = 140ms
BuffSize= 1000000; // 1M buffer = 46ms
Code:
procedure TfrmTester.btnBufferedClick(Sender: TObject);
VAR
s: string;
FileBody: string;
c: Cardinal;
CurBuffLen, marker, i: Integer;
begin
FileBody:= ReadFile(File4MB);
c:= GetTickCount;
marker:= 1;
CurBuffLen:= 0;
for i:= 1 to Length(FileBody) DO
begin
if i > CurBuffLen then
begin
SetLength(s, CurBuffLen+ BuffSize);
CurBuffLen:= Length(s)
end;
s[marker]:= FileBody[i];
Inc(marker);
end;
SetLength(s, marker-1); { Cut down the prealocated buffer that we haven't used }
Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms');
if s <> FileBody
then Log.Lines.Add('FAILED!');
end;
Prebuffered, as class
Time:
BuffSize= 10000; // 10k buffer = 437ms
BuffSize= 100000; // 100k buffer = 187ms
BuffSize= 1000000; // 1M buffer = 78ms
Code:
procedure TfrmTester.btnBuffClassClick(Sender: TObject);
VAR
StringBuff: TCStringBuff;
s: string;
FileBody: string;
c: Cardinal;
i: Integer;
begin
FileBody:= ReadFile(File4MB);
c:= GetTickCount;
StringBuff:= TCStringBuff.Create(BuffSize);
TRY
for i:= 1 to Length(FileBody) DO
StringBuff.AddChar(filebody[i]);
s:= StringBuff.GetResult;
FINALLY
FreeAndNil(StringBuff);
END;
Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms');
if s <> FileBody
then Log.Lines.Add('FAILED!');
end;
And this is the class:
{ TCStringBuff }
constructor TCStringBuff.Create(aBuffSize: Integer= 10000);
begin
BuffSize:= aBuffSize;
marker:= 1;
CurBuffLen:= 0;
inp:= 1;
end;
function TCStringBuff.GetResult: string;
begin
SetLength(s, marker-1); { Cut down the prealocated buffer that we haven't used }
Result:= s;
s:= ''; { Free memory }
end;
procedure TCStringBuff.AddChar(Ch: Char);
begin
if inp > CurBuffLen then
begin
SetLength(s, CurBuffLen+ BuffSize);
CurBuffLen:= Length(s)
end;
s[marker]:= Ch;
Inc(marker);
Inc(inp);
end;
Conclusion:
Stop using s:= s + c if you have large (over 10K) strings. It might be true even if you have small strings but you do it often (for example, you have a function that is doing some string processing on a small string, but you call it often).
_
PS: You may also want to see this: https://www.delphitools.info/2013/10/30/efficient-string-building-in-delphi/2/
TStringBuilder was introduced solely to provide a source code compatible mechanism for applications to perform string handling in Delphi and Delphi.NET. You sacrifice some speed in Delphi for some potentially significant benefits in Delphi.NET
The StringBuilder concept in .NET addresses performance issues with the string implementation on that platform, issues that the Delphi (native code) platform simply does not have.
If you are not writing code that needs to be compiled for both native code and Delphi.NET then there is simply no reason to use TStringBuilder.
According to Marco Cantu not for speed, but you might get cleaner code and better code compatibility with .Net. Here (and some corrections here) another speed test with TStringBuilder not being faster.
TStringBuilder is basically just a me-too feature, like LachlanG said. It's needed in .NET because CLR strings are immutable, but Delphi doesn't have that problem so it doesn't really require a string builder as a workaround.
精彩评论