Junk Generator Speed Problem
I am looking into generating a file ( 750 MB ) full of random byt开发者_开发技巧es. The code I am using in a separate thread looks like this:
I allocated a buffer of that size since writing on the disk consumes more time:
function Generate(buf:Pointer):DWORD;stdcall;
var
i:DWORD;
begin
for i := 0 to keysize -1 do
PByte(DWORD(buf) + i)^ := Random(256);
Result:=0;
end;
The problem is that it takes ages until the process completes. Any ideas for a faster method? I will try to implement it in assembly if there isn't any alternative.
This sounded like a nice practice problem so I went ahead and implemented a parallel solution. It uses slightly over 3 seconds to generate 750 MB file and uses over 90% CPU during its work. (SSD disk helps, too. 3,5 seconds were needed to generate the file on a RAID0 disk pair and 4 seconds to generate a file on a slower 512 GB disk.)
All reused code is available with the OpenBSD license (which is almost "use however you wish"): DSiWin32, GpStuff, GpRandomGen, Otl*.
uses
DSiWin32,
GpStuff,
GpRandomGen,
OtlCommon,
OtlCollections,
OtlParallel;
{$R *.dfm}
procedure FillBuffer(buf: pointer; bufSize: integer; randomGen: TGpRandom);
var
buf64: PInt64;
buf8 : PByte;
i : integer;
rnd : int64;
begin
buf64 := buf;
for i := 1 to bufSize div SizeOf(int64) do begin
buf64^ := randomGen.Rnd64;
Inc(buf64);
end;
rnd := randomGen.Rnd64;
buf8 := PByte(buf64);
for i := 1 to bufSize mod SizeOf(int64) do begin
buf8^ := rnd AND $FF;
rnd := rnd SHR 8;
Inc(buf8);
end;
end; { FillBuffer }
procedure CreateRandomFile(fileSize: integer; output: TStream);
const
CBlockSize = 1 * 1024 * 1024 {1 MB};
var
buffer : TOmniValue;
lastBufferSize: integer;
memStr : TMemoryStream;
numBuffers : integer;
outQueue : IOmniBlockingCollection;
begin
outQueue := TOmniBlockingCollection.Create;
numBuffers := (fileSize - 1) div CBlockSize + 1;
lastBufferSize := (fileSize - 1) mod CBlockSize + 1;
Parallel.ForEach(1, numBuffers).NoWait
.NumTasks(Environment.Process.Affinity.Count)
.OnStop(
procedure
begin
outQueue.CompleteAdding;
end)
.Initialize(
procedure(var taskState: TOmniValue)
begin
taskState := TGpRandom.Create;
end)
.Finalize(
procedure(const taskState: TOmniValue)
begin
taskState.AsObject.Free;
end)
.Execute(
procedure(const value: integer; var taskState: TOmniValue)
var
buffer : TMemoryStream;
bytesToWrite: integer;
begin
if value = numBuffers then
bytesToWrite := lastBufferSize
else
bytesToWrite := CBlockSize;
buffer := TMemoryStream.Create;
buffer.Size := bytesToWrite;
FillBuffer(buffer.Memory, bytesToWrite, taskState.AsObject as TGpRandom);
outQueue.Add(buffer);
end);
for buffer in outQueue do begin
memStr := buffer.AsObject as TMemoryStream;
output.CopyFrom(memStr, 0);
FreeAndNil(memStr);
end;
end;
procedure TForm43.btnRandomClick(Sender: TObject);
var
fileStr: TFileStream;
time : int64;
begin
time := DSiTimeGetTime64;
try
fileStr := TFileStream.Create('e:\0\random.dat', fmCreate);
try
CreateRandomFile(750*1024*1024, fileStr);
finally FreeAndNil(fileStr); end;
finally Caption := Format('Completed in %d ms', [DSiElapsedTime64(time)]); end;
end;
EDIT: Using ForEach in this case was not really elegant solution so I enhanced OmniThreadLibrary with Parallel.ParallelTask and with better IOmniCounter. Using release 993 (or newer) from the SVN you can solve this multiple-producer-single-consumer problem as follows.
procedure CreateRandomFile(fileSize: integer; output: TStream);
const
CBlockSize = 1 * 1024 * 1024 {1 MB};
var
buffer : TOmniValue;
memStr : TMemoryStream;
outQueue : IOmniBlockingCollection;
unwritten: IOmniCounter;
begin
outQueue := TOmniBlockingCollection.Create;
unwritten := CreateCounter(fileSize);
Parallel.ParallelTask.NoWait
.NumTasks(Environment.Process.Affinity.Count)
.OnStop(Parallel.CompleteQueue(outQueue))
.Execute(
procedure
var
buffer : TMemoryStream;
bytesToWrite: integer;
randomGen : TGpRandom;
begin
randomGen := TGpRandom.Create;
try
while unwritten.Take(CBlockSize, bytesToWrite) do begin
buffer := TMemoryStream.Create;
buffer.Size := bytesToWrite;
FillBuffer(buffer.Memory, bytesToWrite, randomGen);
outQueue.Add(buffer);
end;
finally FreeAndNil(randomGen); end;
end
);
for buffer in outQueue do begin
memStr := buffer.AsObject as TMemoryStream;
output.CopyFrom(memStr, 0);
FreeAndNil(memStr);
end;
end;
EDIT2: A longer blog post about this problem: Life after 2.1: Parallel data production (Introducing Parallel.Task)
I don't know about Delphi, but it might be wasting time on the Random(256)
call. Why don't you handcode something pseudorandom to the effect of
n = (n * 1103515245 + 12345) & 0xff;
Let n
start somewhere and use the recursion, such as this one, to generate next n
. It's not really that random, but it should do for creating random files.
EDIT
Some food for thought. If you are creating this file in hopes that it will not be easily compressible, then the method outlined above isn't that good, because of the & 0xff
part. It's better then to do
n = (n * 1103515245 + 12345) & 0x7fffffff;
as 0x7fffffff = 2147483647
is a prime number. And store the exact larger value of n
, and do a n % 256
on assignment. I've had some good runs with this choice of constants, and much prefer it as entropy source to the in-built .NET alternative, as it's many times faster, and you seldom need truly random or better pseudorandom numbers anyway.
The problem is that Random()
has limited entropy. And if you generate 750MiB of data, you will get only one of the 2^31
possible different strings (since that is the period of the RNG), not 2^(750*1024*1024*8)
, which would be the case if generator was perfect. This is a huge disparity.
In short, if you use Random(), your data is not random at all. Anyone could guess all 750MiB of data from a 4MB sample / piece of the file.
You have to do it different way. If you have linux machine, execute this command from your program:
dd if=/dev/urandom of=file.img bs=1M count=750
It finishes in under a half a minute on my old laptop.
As the Random function has no good distribution anyway, you may reduce your code by nearly a factor of four with the following:
function Generate(buf: Pointer): DWORD; stdcall;
var
i: DWORD;
p: PInteger;
begin
p := buf;
for i := 0 to (keysize div 4) - 1 do begin
p^ := Random(MaxInt);
Inc(p);
end;
Result := 0;
end;
Update: The above code needs about 650ms on my system, while the original code needs about 3s.
You could try RandomRange(Low(Integer), High(Integer))
and see if it works. This will generate 4 bytes of random data at a time (be aware that it's signed, and that I'm supposing the Integer is 4 bytes, but The Integer type is an Integer whose size is not guaranteed
(http://www.delphibasics.co.uk/RTL.asp?Name=Integer).
var
F: TFileStream;
I: Cardinal;
index: integer;
a: array[1..10240] of Cardinal;
IndexA: integer;
T1: TDateTime;
begin
T1 := Now;
F := TFileStream.Create( 'D:\filler.fil', fmCreate);
try
for index := 1 to (650 * MByte) div (sizeof( A)) do begin
for indexA := 1 to 10240 do begin
a[ IndexA] := Random( 4294967295 );
end;
F.WriteBuffer( A, SizeOf( A));
end;
finally
F.Free;
end;
ShowMessage( SecondsBetween( T1, Now));
end;
Works in 3~4 seconds on an SSD drive. Way easier.
Aside from do your own Random() function and/or use aditional CPU's, for loops a fast approach is:
procedure Generate(p: pointer; size: integer);
type
TCardinalArray = array[0..0] of cardinal;
PCardinalArray = ^TCardinalArray;
var
i: integer;
begin
i := (size div 4) - 1;
while i >= 0 do
begin
PCardinalArray(p)[i] := Random(MaxInt) * 2;
Dec(i);
end;
end;
Since there is no need to increment the pointer and the loop index is compared with a TEST op.
Unit6.pas.46: i := (size div 4) - 1;
0045209C 8BD9 mov ebx,ecx
0045209E 85DB test ebx,ebx
004520A0 7903 jns $004520a5
004520A2 83C303 add ebx,$03
004520A5 C1FB02 sar ebx,$02
004520A8 4B dec ebx
Unit6.pas.47: while i >= 0 do
004520A9 85DB test ebx,ebx
004520AB 7C14 jl $004520c1
Unit6.pas.49: PCardinalArray(p)[i] := Random(MaxInt) * 2;
004520AD B8FFFFFF7F mov eax,$7fffffff
004520B2 E8C50EFBFF call Random
004520B7 03C0 add eax,eax
004520B9 89049E mov [esi+ebx*4],eax
Unit6.pas.50: Dec(i);
004520BC 4B dec ebx
Unit6.pas.47: while i >= 0 do
004520BD 85DB test ebx,ebx
004520BF 7DEC jnl $004520ad
Of course there is no big difference, but it is something...
Excepting other factors, the major speed issues I see with the code in the original post are:
1) running Random for every byte. This function counts for the majority of the processing. Processing every four bytes will be advantageous. 2) minimize the calculations within the loop. I would establish the pointer bounds and then run a while loop (inc or dec by 4) until the difference between the upper bound and the lower bound is less than 4, then inc or dec by 1 the rest of the way. I probably wouldn't consider a for loop at any point in this. 3) I wouldn't run this against a huge amount of data - I wouldn't do 750MB all at once because the speed degradation for handling that amount of data tends to outweigh any performance enhancements through code.
Very lightly tested, and probably much to be improved upon, but the basic idea I had is here:
function Generate(buf: Pointer): DWord; stdcall;
var
inbuf, uplimit: Cardinal;
begin
inbuf := Cardinal(buf);
uplimit := inbuf + keysize - 1;
while (uplimit - inbuf) >= 4 do
begin
PDWord(inbuf)^ := Random(MAXINT);
inc(inbuf, 4);
end;
while inbuf <= uplimit do
begin
PByte(inbuf)^ := Random(256);
inc(inbuf, 1);
end;
Result := 0;
end;
精彩评论