开发者

Can I Eliminate Extra Unicode String Calls (Delphi)

I'm using Delphi 2009. In my program, I have been working very hard to optimize all my Delphi code for speed and memory use, especially my Unicode string handling.

I have the following statement:

    Result := Result + GetFirstLastName(IndiID, 1);

When I debug that line, upon return from the GetFirstLastName function, it traces into the routine _UStrArrayClr in the System unit:

procedure _UStrArrayClr(var StrArray; Count: Integer);
asm
        JMP     _LStrArrayClr
end;

This calls _LStrArrayClr:

procedure       _LStrArrayClr(var StrArray; cnt: longint);
{$IFDEF PUREPASCAL}
var
  P: Pointer;
begin
  P := @StrArray;
  while cnt > 0 do
  begin
    _LStrClr(P^);
    Dec(cnt);
    Inc(Integer(P), sizeof(Pointer));
  end;
end;
{$ELSE}
asm
        { ->    EAX pointer to str      }
        {       EDX cnt         }

        PUSH    EBX
        PUSH    ESI
        MOV     EBX,EAX
        MOV     ESI,EDX

@@loop:
        MOV     EDX,[EBX]                       { fetch str                     }
        TEST    EDX,EDX                         { if nil, nothing to do         }
        JE      @@doneEntry
        MOV     dword ptr [EBX],0               { clear str                     }
        MOV     ECX,[EDX-skew].StrRec.refCnt    { fetch refCnt                  }
        DEC     ECX                             { if < 0: literal str           }
        JL      @@doneEntry
   LOCK DEC     [EDX-skew].StrRec.refCnt        { threadsafe dec refCount       }
        JNE     @@doneEntry
        LEA     EAX,[EDX-skew].StrRec.codePage  { if refCnt now zero, deallocate}
        CALL    _FreeMem
@@doneEntry:
        ADD     EBX,4
        DEC     ESI
        JNE     @@loop

        POP     ESI
        POP     EBX
end;
{$ENDIF}

and runs through the loop once for each character, and on exit from there it calls _UStrCat:

procedure _UStrCat(var Dest: UnicodeString; const Source: UnicodeString);
asm
        { ->    EAX     pointer to dest }
        {       EDX source              }

        TEST    EDX,EDX       // Source empty, nop.
        JE      @@exit

        MOV     ECX,[EAX]     // ECX := Dest
        TEST    ECX,ECX       // Nil source => assignment
        JE      _UStrAsg

        PUSH    EBX
        PUSH    ESI
        PUSH    EDI
        MOV     EBX,EAX         // EBX := @Dest
        MOV     ESI,EDX         // ESI := Source
     开发者_如何学编程   CMP     ESI,ECX
        JE      @@appendSelf

        CMP     [ECX-skew].StrRec.elemSize,2
        JE      @@destIsUnicode
        CALL    _EnsureUnicodeString
        MOV     EDI,EAX
        MOV     ECX,EAX

@@destIsUnicode:
        PUSH    0
        CMP     [ESI-skew].StrRec.elemSize,2
        JE      @@sourceIsUnicode

        MOV     EDI,ECX
        MOV     EAX,ESI
        MOV     [ESP],ESI
        CALL    _UStrAddRef
        MOV     EAX,ESP
        CALL    _EnsureUnicodeString
        MOV     ESI,[ESP]
        MOV     ECX,EDI

@@sourceIsUnicode:
        MOV     EDI,[ECX-skew].StrRec.length  // EDI := Length(Dest)
        MOV     EDX,[ESI-skew].StrRec.length  // EDX := Length(Source)
        ADD     EDX,EDI         // EDX := (Length(Source) + Length(Dest)) * 2
        TEST    EDX,$C0000000
        JNZ     @@lengthOverflow

        MOV     EAX,EBX
        CALL    _UStrSetLength  // Set length of Dest
        MOV     EAX,ESI         // EAX := Source
        MOV     ECX,[ESI-skew].StrRec.length // ECX := Length(Source)

@@noTemp:
        MOV     EDX,[EBX]       // EDX := Dest
        SHL     EDI,1           // EDI to bytes (Length(Dest) * 2)
        ADD     EDX,EDI         // Offset EDX for destination of move
        SHL     ECX,1           // convert Length(Source) to bytes
        CALL    Move            // Move(Source, Dest + Length(Dest)*2, Length(Source)*2)
        MOV     EAX,ESP         // Need to clear out the temp we may have created above
        MOV     EDX,[EAX]
        TEST    EDX,EDX
        JE      @@tempEmpty

        CALL    _LStrClr

@@tempEmpty:
        POP     EAX
        POP     EDI
        POP     ESI
        POP     EBX
        RET

@@appendSelf:
        CMP     [ECX-skew].StrRec.elemSize,2
        JE      @@selfIsUnicode
        MOV     EAX,EBX
        XOR     EDX,EDX
        CALL    _EnsureUnicodeString
        MOV     ECX,EAX
        MOV     EAX,EBX

@@selfIsUnicode:
        MOV     EDI,[ECX-skew].StrRec.length
        MOV     EDX,EDI
        SHL     EDX,1
        TEST    EDX,$C0000000
        JNZ     @@lengthOverflow
        CALL    _UStrSetLength
        MOV     EAX,[EBX]
        MOV     ECX,EDI
        PUSH    0
        JMP     @@noTemp

@@lengthOverflow:
        JMP     _IntOver

@@exit:
end;

and runs through the whole of that routine.

My "Result" is a string and is thus Unicode. And my GetFirstLastName returns a string which is Unicode. No conversion of character set should be needed.

I can't really tell what these System procedures are doing, but they are adding a lot of overhead to my routine.

What are they doing? Are they necessary? If they aren't necessary, how can I prevent the compiler from calling those routines?


LStrArrayClear isn't running over a loop once per character; it's running once per string in the array, to decrement the ref count and free the string if it hits 0. This is inserted by the compiler to clean up any strings allocated as local variables, or any temporary strings it creates to hold the results of two strings being concatenated.

UStrCat is the string concatenation routine. It's what string1 + string2 translates to under the hood. The compiler determines that it's supposed to result in a Unicode string, so it takes the two input strings, tests both of them to see if they're Unicode themselves, converts them if they're not (but yours are, so the conversion gets skipped,) then sets the size of the result and copies the data.

UStrCat is necessary, and there's not much you can do about it. LStrArrayClear is where things get a bit fuzzier. When you create a routine that works with strings, the compiler has to allocate enough temporary strings to handle everything you could do in there, whether or not you ever do it. And then it has to clear them afterwards. So cutting down on unnecessary string manipulation by moving uncommon tasks to other functions can help, especially in a tight loop.

For example, how often do you see something like this?

if SomethingIsVeryWrong then
   raise ETimeToPanic.Create('Everybody panic! File ' + filename + ' is corrupt at address ' + intToStr(FailureAddress) + '!!!');

This error message contains 5 different substrings. Even if it manages to optimize things by reusing them, it still needs to allocate at least two temporary strings to make this work. Let's say this is taking place inside a tight loop and you don't expect this error to happen frequently, if at all. You can eliminate the temporary strings by offloading the string concatenation into a Format call. That's such a convenient optimization, in fact, that it's built into Exception.

if SomethingIsVeryWrong then
   raise ETimeToPanic.CreateFmt('Everybody panic! File %s is corrupt at address %d!!!', [filename, FailureAddress]);

Yes, a call to Format will run significantly slower than straight concatenation, but if something goes wrong, it only runs once and performance is the least of your worries anyway.


The compiler will often create temporaries in which to hold the intermediate values of expressions. These temporaries need to be "finalized" or cleaned up. Since the compiler doesn't know whether or not a certain temp has actually been used (it will skip the finalization if it sees that the variable is still nil), it will always attempt a cleanup pass.


You may also be insterested in these:

  • Performance tip: Length(String) and Str[Index] in Delphi 2009
  • Why I hate C++Builder 2009…
  • Requiem for the {$STRINGCHECKS xx} directive…


Take a look at TStringBuilder class.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜