开发者

Words stats in a Text with Delphi

I'm using Delphi 7. I would like to count the number of repetitions of every开发者_如何学C word in a large text (500 words). How could I do it?


here is a kind of brute force way of doing it. it uses a string list and stores the count of each word cast as an object to the list item.

var
  i : integer;
  iCount : integer;
  idxFound : integer;
  someText : string;
  s : TStringList;
  oneWord : string;

begin
  someText := 'this that theother and again this that theother this is not that';
  oneWord := '';

  s := TStringList.Create;
  for i := 1 to length(someText) do begin
    if someText[i] = ' ' then begin
      idxFound := s.indexof(oneWord);
      if idxFound >= 0 then begin
        iCount := integer(s.objects[idxFound]);
        s.Objects[idxFound] := TObject(iCount + 1);
      end
      else begin
        s.AddObject(oneWord, TObject(1));
      end;
      oneWord := '';
    end
    else begin
      oneWord := oneWord + someText[i];
    end;
  end;

  if oneWord <> '' then
    if idxFound >= 0 then begin
      iCount := integer(s.objects[idxFound]);
      s.Objects[idxFound] := TObject(iCount + 1);
    end
    else begin
      s.AddObject(oneWord, TObject(1));
    end;

  // put the results on the screen in a text box.
  memo1.Text := '';
  for i := 0 to s.Count - 1 do
    memo1.Lines.Add(intToStr(integer(s.Objects[i])) + ' ' + s[i]);


I don't recall any built-in Delphi functions that directly do this. But a simple O(n*Log(n)) method would be to sort the words and then scan and count them.


If we are talking the number of words in a text string, what you could do was to parse the string, and identify the words. Add the words to a map, where the identifier is the word it self, and the value a number. This number is increased if the word you find in the string already exists in the map.

map<string, int>
foreach word in string
    if word is in map
        map[word] = map[word] + 1
    else
        map[word] = 1
    end if
end for

Since I don't know delphi that well I have tried to provide you with a pseudo code example.


a TSTringList can also be used for the "list of words". Run through all of your words, and add each and everyone to the tStringlist as a new item. When your done, you have a TOTAL count, to determine the unique words, sort the list, and in a loop see if the current word is different from the previous one...if so, then increment your unique word count.


From the FPC strutils library:

function WordCount(const S: string; const WordDelims: TSysCharSet): Integer;

var
 P,PE : PChar;

begin
  Result:=0;
  P:=Pchar(pointer(S));
  PE:=P+Length(S);
  while (P<PE) do
    begin
    while (P<PE) and (P^ in WordDelims) do
      Inc(P);
    if (P<PE) then
      inc(Result);
    while (P<PE) and not (P^ in WordDelims) do
      inc(P);
    end;
end;

wordcount (test,[',','.',' ','!','?',#10,#13]); would be a good first attempt. Its meant for simple magnitude calculations, since it e.g. doesn't take care of abbreviated words.

Of course if you hand this in as homework, you'll probably be asked to explain its workings.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜