Is System.Pos in Delphi Flawed?
Problem
If the text in TRichEdit is something like this;
'hello, world'#$D#$A
Then the following routine displays TRUE. However when the RichEdit has
'test'#$D#$A#$D#$A'test'#$D#$A#$D#$A'test'#$D#$A
Then the routine displays FALSE. It seems to me to be flawed as it is finding the comma's but not the newlines/linefeeds. I created a workaround to walk the string instead and find what I'm looking for but am still really curious why the Delphi function doesn't work. Any ideas?
procedure TForm1.Button1Click(Sender: TObject);
var
sTmp : String;
begin
sTmp := RichEdit1.Lines.GetText;
if ( ( Pos( ',', sTmp ) <> 0 ) or
( Pos( '"', sTmp ) <> 0 ) or
( Pos( '\n', sTmp ) <> 0 ) or
( Pos( '\r', sTmp ) <> 0 ) ) then
Label1.Caption := 'TRUE'
else
Label1.Caption := 'FALSE';
end;
Workaround - Andreas' Version (Faster Depending on Input)
function CheckChars( const sData: String ): Boolean;
var
pCur : PChar;
begin
pCur := PChar( sData );
// Exit at NULL terminator
while ( pCur^ <> #0 ) do
begin
case pCur^ of
#13, #10, #34, #44 : Exit(true);
end;
Inc( pCur );
end;
end;
Correct Usage
function CheckChars( const sData: String ): Boolean
begin
if ( ( Pos( #44, sData ) <> 0 ) or
( Pos( #34, sData ) <> 0 ) or
( Pos( #13, sData ) <> 0 ) or
( Pos( #10, sData ) <> 0 ) ) then
Result := true
else
Result := false;
end;
Works for all characters tested, I decided not to mix quoted chars and decimal chars for readability. The only question now is which is quicker? I think my workaround would be quicker since I'm checking each char against all the ones I'm looking for, whereas when I use the System.Pos function I am running the same parsing routine 4 times.
Solution
After some testing, it depends on what kind of characters you are looking for. Testing this with a comma(#44), located 294k characters into a 589k length string. The function using System.Pos has a performance of ~390 microseconds, and the case statement runs ~700 microseconds.
HOWEVER!
If you change the chara开发者_StackOverflow社区cter in the string to a Linefeed(#10) then it takes much much longer for the System.Pos(~2700 microseconds) due to the repeated calls. The case statement still runs ~700 microseconds.
So I guess if your looking for a particular character then System.Pos is definitely the way to go, however if you are looking for multiple(which my app does) then a repeated call isn't necessary when you could just scan it and use the case statement.
I don't think Delphi recognises \n as a new line, Pos thinks you are actually searching for the characters "\" and "n".
Try searching for #13
and #10
(Carriage Return and Line Feed) instead (Alternatively you could use #$D
and #$A
which would be the hex equivilent.)
e.g.
if ( ( Pos( ',', sTmp ) <> 0 ) or
( Pos( '"', sTmp ) <> 0 ) or
( Pos( #10, sTmp ) <> 0 ) or
( Pos( #13, sTmp ) <> 0 ) ) then
Also Delphi Strings are counted and while they always end in #0 There is no guarantee that the string doesn't contain a null character, meaning that your while loop may terminate early.
so alternatively you could loop through for i := 1 to Length(sTmp) (Starting at 1 as sTmp[0] is the counter).
or you could construct your while loop as
Index := 1;
While Index < Length(sTmp) do
begin
case sTmp[Index] of
etc...
(This is actually a comment, but it would look horrible as one.)
Please notice that your entire block
case pCur^ of
#13 : // CR
begin
Result := true;
break;
end;
#10 : // LF
begin
Result := true;
break;
end;
#34 : // Quote
begin
Result := true;
break;
end;
#44 : // Comma
begin
Result := true;
break;
end;
end;
can be written more compactly by noting that
Result := true; break;
in this case amounts to the same thing asResult := true; Exit;
which always can be writtenExit(true)
.Several cases can be combined to a single case, if the actions are identical.
Hence you can write
case pCur^ of
#13, #10, #34, #44: Exit(true);
end;
But even better, the entire function can be written
function CheckChars(const Str: string): boolean;
const
INTERESTING_CHARS = [#13, #10, #34, #44];
var
i: integer;
begin
result := false;
for i := 1 to length(Str) do
if Str[i] in INTERESTING_CHARS then
Exit(true);
end;
精彩评论