Can I Eliminate Extra Unicode String Calls (Delphi)
I'm using Delphi 2009. In my program, I have been working very hard to optimize all my Delphi code for speed and memory use, especially my Unicode string handling.
I have the following statement:
Result := Result + GetFirstLastName(IndiID, 1);
When I debug that line, upon return from the GetFirstLastName function, it traces into the routine _UStrArrayClr in the System unit:
procedure _UStrArrayClr(var StrArray; Count: Integer);
asm
JMP _LStrArrayClr
end;
This calls _LStrArrayClr:
procedure _LStrArrayClr(var StrArray; cnt: longint);
{$IFDEF PUREPASCAL}
var
P: Pointer;
begin
P := @StrArray;
while cnt > 0 do
begin
_LStrClr(P^);
Dec(cnt);
Inc(Integer(P), sizeof(Pointer));
end;
end;
{$ELSE}
asm
{ -> EAX pointer to str }
{ EDX cnt }
PUSH EBX
PUSH ESI
MOV EBX,EAX
MOV ESI,EDX
@@loop:
MOV EDX,[EBX] { fetch str }
TEST EDX,EDX { if nil, nothing to do }
JE @@doneEntry
MOV dword ptr [EBX],0 { clear str }
MOV ECX,[EDX-skew].StrRec.refCnt { fetch refCnt }
DEC ECX { if < 0: literal str }
JL @@doneEntry
LOCK DEC [EDX-skew].StrRec.refCnt { threadsafe dec refCount }
JNE @@doneEntry
LEA EAX,[EDX-skew].StrRec.codePage { if refCnt now zero, deallocate}
CALL _FreeMem
@@doneEntry:
ADD EBX,4
DEC ESI
JNE @@loop
POP ESI
POP EBX
end;
{$ENDIF}
and runs through the loop once for each character, and on exit from there it calls _UStrCat:
procedure _UStrCat(var Dest: UnicodeString; const Source: UnicodeString);
asm
{ -> EAX pointer to dest }
{ EDX source }
TEST EDX,EDX // Source empty, nop.
JE @@exit
MOV ECX,[EAX] // ECX := Dest
TEST ECX,ECX // Nil source => assignment
JE _UStrAsg
PUSH EBX
PUSH ESI
PUSH EDI
MOV EBX,EAX // EBX := @Dest
MOV ESI,EDX // ESI := Source
开发者_如何学编程 CMP ESI,ECX
JE @@appendSelf
CMP [ECX-skew].StrRec.elemSize,2
JE @@destIsUnicode
CALL _EnsureUnicodeString
MOV EDI,EAX
MOV ECX,EAX
@@destIsUnicode:
PUSH 0
CMP [ESI-skew].StrRec.elemSize,2
JE @@sourceIsUnicode
MOV EDI,ECX
MOV EAX,ESI
MOV [ESP],ESI
CALL _UStrAddRef
MOV EAX,ESP
CALL _EnsureUnicodeString
MOV ESI,[ESP]
MOV ECX,EDI
@@sourceIsUnicode:
MOV EDI,[ECX-skew].StrRec.length // EDI := Length(Dest)
MOV EDX,[ESI-skew].StrRec.length // EDX := Length(Source)
ADD EDX,EDI // EDX := (Length(Source) + Length(Dest)) * 2
TEST EDX,$C0000000
JNZ @@lengthOverflow
MOV EAX,EBX
CALL _UStrSetLength // Set length of Dest
MOV EAX,ESI // EAX := Source
MOV ECX,[ESI-skew].StrRec.length // ECX := Length(Source)
@@noTemp:
MOV EDX,[EBX] // EDX := Dest
SHL EDI,1 // EDI to bytes (Length(Dest) * 2)
ADD EDX,EDI // Offset EDX for destination of move
SHL ECX,1 // convert Length(Source) to bytes
CALL Move // Move(Source, Dest + Length(Dest)*2, Length(Source)*2)
MOV EAX,ESP // Need to clear out the temp we may have created above
MOV EDX,[EAX]
TEST EDX,EDX
JE @@tempEmpty
CALL _LStrClr
@@tempEmpty:
POP EAX
POP EDI
POP ESI
POP EBX
RET
@@appendSelf:
CMP [ECX-skew].StrRec.elemSize,2
JE @@selfIsUnicode
MOV EAX,EBX
XOR EDX,EDX
CALL _EnsureUnicodeString
MOV ECX,EAX
MOV EAX,EBX
@@selfIsUnicode:
MOV EDI,[ECX-skew].StrRec.length
MOV EDX,EDI
SHL EDX,1
TEST EDX,$C0000000
JNZ @@lengthOverflow
CALL _UStrSetLength
MOV EAX,[EBX]
MOV ECX,EDI
PUSH 0
JMP @@noTemp
@@lengthOverflow:
JMP _IntOver
@@exit:
end;
and runs through the whole of that routine.
My "Result" is a string and is thus Unicode. And my GetFirstLastName returns a string which is Unicode. No conversion of character set should be needed.
I can't really tell what these System procedures are doing, but they are adding a lot of overhead to my routine.
What are they doing? Are they necessary? If they aren't necessary, how can I prevent the compiler from calling those routines?
LStrArrayClear isn't running over a loop once per character; it's running once per string in the array, to decrement the ref count and free the string if it hits 0. This is inserted by the compiler to clean up any strings allocated as local variables, or any temporary strings it creates to hold the results of two strings being concatenated.
UStrCat is the string concatenation routine. It's what string1 + string2
translates to under the hood. The compiler determines that it's supposed to result in a Unicode string, so it takes the two input strings, tests both of them to see if they're Unicode themselves, converts them if they're not (but yours are, so the conversion gets skipped,) then sets the size of the result and copies the data.
UStrCat is necessary, and there's not much you can do about it. LStrArrayClear is where things get a bit fuzzier. When you create a routine that works with strings, the compiler has to allocate enough temporary strings to handle everything you could do in there, whether or not you ever do it. And then it has to clear them afterwards. So cutting down on unnecessary string manipulation by moving uncommon tasks to other functions can help, especially in a tight loop.
For example, how often do you see something like this?
if SomethingIsVeryWrong then
raise ETimeToPanic.Create('Everybody panic! File ' + filename + ' is corrupt at address ' + intToStr(FailureAddress) + '!!!');
This error message contains 5 different substrings. Even if it manages to optimize things by reusing them, it still needs to allocate at least two temporary strings to make this work. Let's say this is taking place inside a tight loop and you don't expect this error to happen frequently, if at all. You can eliminate the temporary strings by offloading the string concatenation into a Format call. That's such a convenient optimization, in fact, that it's built into Exception
.
if SomethingIsVeryWrong then
raise ETimeToPanic.CreateFmt('Everybody panic! File %s is corrupt at address %d!!!', [filename, FailureAddress]);
Yes, a call to Format will run significantly slower than straight concatenation, but if something goes wrong, it only runs once and performance is the least of your worries anyway.
The compiler will often create temporaries in which to hold the intermediate values of expressions. These temporaries need to be "finalized" or cleaned up. Since the compiler doesn't know whether or not a certain temp has actually been used (it will skip the finalization if it sees that the variable is still nil), it will always attempt a cleanup pass.
You may also be insterested in these:
- Performance tip: Length(String) and Str[Index] in Delphi 2009
- Why I hate C++Builder 2009…
- Requiem for the {$STRINGCHECKS xx} directive…
Take a look at TStringBuilder class.
精彩评论