开发者

Is System.Pos in Delphi Flawed?

Problem

If the text in TRichEdit is something like this;

'hello, world'#$D#$A

Then the following routine displays TRUE. However when the RichEdit has

'test'#$D#$A#$D#$A'test'#$D#$A#$D#$A'test'#$D#$A

Then the routine displays FALSE. It seems to me to be flawed as it is finding the comma's but not the newlines/linefeeds. I created a workaround to walk the string instead and find what I'm looking for but am still really curious why the Delphi function doesn't work. Any ideas?

procedure TForm1.Button1Click(Sender: TObject);
var
   sTmp : String;
begin
   sTmp := RichEdit1.Lines.GetText;
   if ( ( Pos( ',', sTmp )  <> 0 ) or
        ( Pos( '"', sTmp )  <> 0 ) or
        ( Pos( '\n', sTmp ) <> 0 ) or
        ( Pos( '\r', sTmp ) <> 0 ) ) then
      Label1.Caption := 'TRUE'
   else
      Label1.Caption := 'FALSE';
end;

Workaround - Andreas' Version (Faster Depending on Input)

function CheckChars( const sData: String ): Boolean;
var
   pCur : PChar;
begin
   pCur := PChar( sData );

   // Exit at NULL terminator
   while ( pCur^ <> #0 ) do
   begin
      case pCur^ of
         #13, #10, #34, #44 : Exit(true);
      end;
      Inc( pCur );
   end;
end;

Correct Usage

function CheckChars( const sData: String ): Boolean
begin
   if ( ( Pos( #44, sData ) <> 0 ) or
        ( Pos( #34, sData ) <> 0 ) or
        ( Pos( #13, sData ) <> 0 ) or
        ( Pos( #10, sData ) <> 0 ) ) then
      Result := true
   else
      Result := false;
end;

Works for all characters tested, I decided not to mix quoted chars and decimal chars for readability. The only question now is which is quicker? I think my workaround would be quicker since I'm checking each char against all the ones I'm looking for, whereas when I use the System.Pos function I am running the same parsing routine 4 times.

Solution

After some testing, it depends on what kind of characters you are looking for. Testing this with a comma(#44), located 294k characters into a 589k length string. The function using System.Pos has a performance of ~390 microseconds, and the case statement runs ~700 microseconds.

HOWEVER!

If you change the chara开发者_StackOverflow社区cter in the string to a Linefeed(#10) then it takes much much longer for the System.Pos(~2700 microseconds) due to the repeated calls. The case statement still runs ~700 microseconds.

So I guess if your looking for a particular character then System.Pos is definitely the way to go, however if you are looking for multiple(which my app does) then a repeated call isn't necessary when you could just scan it and use the case statement.


I don't think Delphi recognises \n as a new line, Pos thinks you are actually searching for the characters "\" and "n".

Try searching for #13 and #10 (Carriage Return and Line Feed) instead (Alternatively you could use #$D and #$A which would be the hex equivilent.)

e.g.

if ( ( Pos( ',', sTmp )  <> 0 ) or
     ( Pos( '"', sTmp )  <> 0 ) or
     ( Pos( #10, sTmp )  <> 0 ) or
     ( Pos( #13, sTmp )  <> 0 ) ) then

Also Delphi Strings are counted and while they always end in #0 There is no guarantee that the string doesn't contain a null character, meaning that your while loop may terminate early.

so alternatively you could loop through for i := 1 to Length(sTmp) (Starting at 1 as sTmp[0] is the counter).

or you could construct your while loop as

Index := 1;

While Index < Length(sTmp) do
begin
    case sTmp[Index] of
    etc...


(This is actually a comment, but it would look horrible as one.)

Please notice that your entire block

case pCur^ of
 #13 :   // CR
    begin
       Result := true;
       break;
    end;
 #10 :   // LF
    begin
       Result := true;
       break;
   end;
  #34 :   // Quote 
   begin
        Result := true;
       break;
    end;
 #44 :   // Comma
    begin
       Result := true;
       break;
    end;
 end;

can be written more compactly by noting that

  • Result := true; break; in this case amounts to the same thing as Result := true; Exit; which always can be written Exit(true).

  • Several cases can be combined to a single case, if the actions are identical.

Hence you can write

case pCur^ of
  #13, #10, #34, #44: Exit(true);
end;

But even better, the entire function can be written

function CheckChars(const Str: string): boolean;
const
  INTERESTING_CHARS = [#13, #10, #34, #44];
var
  i: integer;
begin
  result := false;
  for i := 1 to length(Str) do
    if Str[i] in INTERESTING_CHARS then
      Exit(true);
end;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜