开发者

Pointer(s)^ versus s[1]

In a function that reads data (data meaning exclusively strings) from disk, which should I prefer? Which is better?

A) DiskStream.Read(Pointer(s)^, Count)
or
B) DiskStream.Read(s[1], Count)

Note:

I know both are having the same result.

I know that I have to SetLength of S before calling Read.


UPDATE

S is AnsiString.

Here is the full function:

{ Reads a bunch of chars from the file. Why 'ReadChars' and not 'ReadString'? This function reads C++ strings (the length of the string was not written to disk also). So, i have to give the number of chars to read as parameter. }

function TMyStream.ReadChars(out s: AnsiString; CONST Count: Longint): Boolean; 
begin
 SetLength(s, Count);
 Result:= Read(s[1], Count)= Count;
end;

Speed test

In my speed test the first approach was a tiny bit faster than the second one. I used a 400MB file from which I read strings about 200000 times. The process was set to High priority.

The best read time ever was:

1.35 for variant B and 1.37 for variant A.

Average:

On average, B was scoring also 20ms better than A.

The test was repeated 15 times for each variant.

The difference is really small. It could fall into the measuring error range. Probably it will be significant if I read strings more often and from a bigger file. But for the moment let's say that both lines of code are performing the same.

ANSWER

Variant A - might be a tiny tiny bit faster Variant B - is (obviously) much more easier to read and it is mo开发者_开发知识库re Delphi-ish. My preferred.

Note:

I have seen Embarcadero using variant A in TStreamReadBuffer example, but with a TBytes instead of String.


Definitely the array notation. Part of Delphi style is to make your code easy to read, and it's easier to tell what's going on when you spell out exactly what you're doing. Casting a string to a pointer and then dereferencing it looks confusing; why are you doing that? It doesn't make sense unless the reader knows a lot about string internals.


Be aware that when running

1. DiskStream.Read(Pointer(s)^, Count)
2. DiskStream.Read(s[1], Count)

The 1. version will be faster.

But you must be sure that the s variable is explicitly local, or you have called yourself UniqueString(s) before the loop.

Since pointer(s)^ won't call UniqueString?() low-level hidden RTL call, it will be faster than s[1], but you may override some existing data if the s string variable is shared between the current context and other context (e.g. if the last content of s was retrieved from a function from a property value, or s is sent as parameter to another method).

In fact the fastest correct way of coding this reading an AnsiString from content is:

  s := '';
  SetLength(s,Count);
  DiskStream.Read(pointer(s)^,Count);

or

  SetString(s,nil,Count);
  DiskStream.Read(pointer(s)^,Count);

The 2nd version being equal to the 1st, but with one line less.

Setting s to '' will call FreeMem()+AllocMem() instead of ReallocMem() in SetLength(), so will avoid a call to move(), and will be therefore a bit faster.

In fact, the UniqueString?() RTL call generated by s[1] will be very fast, since you have already called SetLength() before calling it: therefore, s is already unique, and UniqueString?() RTL call will return almost immediately. After profiling, there is not much speed difference between the two versions: almost all time is spend in string allocation and content moving from disk. Perhaps s[1] is found to be more "pascalish".


If you care about optimization you should prefer the first variant. Just look at the code generated by compiler:

Unit7.pas.98: Stream.Read(Pointer(S)^, 10);
00470EA9 8B55FC           mov edx,[ebp-$04]
00470EAC B90A000000       mov ecx,$0000000a
00470EB1 8BC6             mov eax,esi
00470EB3 8B18             mov ebx,[eax]
00470EB5 FF530C           call dword ptr [ebx+$0c]

Unit7.pas.99: Stream.Read(s[1], 10);
00470EB8 8B5DFC           mov ebx,[ebp-$04]
00470EBB 85DB             test ebx,ebx
00470EBD 7418             jz $00470ed7
00470EBF 8BC3             mov eax,ebx
00470EC1 83E80A           sub eax,$0a
00470EC4 66833802         cmp word ptr [eax],$02
00470EC8 740D             jz $00470ed7
00470ECA 8D45FC           lea eax,[ebp-$04]
00470ECD 8B55FC           mov edx,[ebp-$04]
00470ED0 E8CB3FF9FF       call @InternalUStrFromLStr
00470ED5 8BD8             mov ebx,eax
00470ED7 8D45FC           lea eax,[ebp-$04]
00470EDA E89950F9FF       call @UniqueStringU
00470EDF 8BD0             mov edx,eax
00470EE1 B90A000000       mov ecx,$0000000a
00470EE6 8BC6             mov eax,esi
00470EE8 8B18             mov ebx,[eax]
00470EEA FF530C           call dword ptr [ebx+$0c]

UPDATE

The above code is generated by Delphi 2009 compiler. You can improve the code by using {$STRINGCHECKS OFF} directive, but you still have UniqueStringU function call overhead:

Unit7.pas.100: Stream.Read(s[1], 10);
00470EB8 8D45FC           lea eax,[ebp-$04]
00470EBB E8B850F9FF       call @UniqueStringU
00470EC0 8BD0             mov edx,eax
00470EC2 B90A000000       mov ecx,$0000000a
00470EC7 8BC3             mov eax,ebx
00470EC9 8B18             mov ebx,[eax]
00470ECB FF530C           call dword ptr [ebx+$0c]


The second option is definitely more "Delphi style" (if you look at the Delphi versions of the Windows API headers, you will see that most pointer parameters have been converted to var parameters).

In addition to that, the second option does not need a cast and is much more readable IMHO.


I'd always use the second one which maintains type safety. I don't really buy the performance argument since you are about to hit the disk at worst, or file cache, or main memory, all of which are going to make a handful of CPU operations look somewhat trivial. Correctness should be given higher priority than performance.

However, I would add that this is not something that should be bothering you too much since you should write this particular piece of code once and once only. Put it in a helper class and wrap it up well. Feel free to care about optimisation, re-write it as assembler, whatever takes your fancy. But don't repeat yourself.


If there is ever any chance that your function will be called with a Count of 0, then A) will work with Pointer(s)^ simply evaluating to nil while B) will crash with a range check exception.

If you want to use B) and still handle counts of 0 gracefully, you should use:

function TMyStream.ReadChars(out s: AnsiString; const Count: Integer): Boolean; 
begin
 SetLength(s, Count);
 Result := (Count = 0)  or (Read(s[1], Count) = Count);
end;


The second one (DiskStream.Read(s[1], Count)). Whenever you encounter an untyped var parameter it reads like "take the address of what is passed as a parameter". So in this case you are passing the address of the first character of the string s, which is what you intended to do.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜