开发者

Read a whole text file into a MATLAB variable at once

I would like to read a (fairly big) log file into a MATLAB string cell in one step. I have used the usual:

s={};
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
   s=[s;tline];
   tline = fgetl(fid);
end

but this is just slow. I have found that

fid = fopen('test.txt');
x=fread(fid,'*char');

is way faster, but I get a nx1 char matrix, x. I could try and convert x to a string cell, but then I get into char encoding hell; line delimiter seems to be \n\r, or 10 and 56 in ASCII (I've looked at the end of the first line), but those two characters often don't follow each other and even show up solo sometimes.

Is there an easy fast way to read an ASCII file into a string cell in one step, or convert x to a string cell?

Reading via fgetl:

Code                           Calls        Total Time      % Time
tline = lower(fgetl(fid));     903113       14.907 s        61.2%

Reading via fread:

>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc

Elapsed time is 0.208614 seconds.

I have tested preallocation, and it does not help :(

files=dir('.');
tic
for i=1:length(files),   
    if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end
    %# preassign s to some large cell array
    sizS = 50000;
    s=cell(sizS,1);

    lineCt = 1;
    fid = fopen(files(i).name);
  开发者_JS百科  tline = fgetl(fid);
    while ischar(tline)
       s{lineCt} = tline;
       lineCt = lineCt + 1;
       %# grow s if necessary
       if lineCt > sizS
           s = [s;cell(sizS,1)];
           sizS = sizS + sizS;
       end
       tline = fgetl(fid);
    end
    %# remove empty entries in s
    s(lineCt:end) = [];
end
toc

Elapsed time is 12.741492 seconds.

Roughly 10 times faster than the original:

s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes);

I had to set 'whitespace' to '' in order to keep the leading spaces (which I need for parsing), and 'bufsize' to the size of the file (the default 4000 threw a buffer overflow error).


The main reason your first example is slow is that s grows in every iteration. This means recreating a new array, copying the old lines, and adding the new one, which adds unnecessary overhead.

To speed up things, you can preassign s

%# preassign s to some large cell array
s=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
   s{lineCt} = tline;
   lineCt = lineCt + 1;
   %# grow s if necessary
   if lineCt > sizS
       s = [s;cell(10000,1)];
       sizS = sizS + 10000;
   end
   tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];

Here's a little example of what preallocation can do for you

>> tic,for i=1:100000,c{i}=i;end,toc
Elapsed time is 10.513190 seconds.

>> d = cell(100000,1);
>> tic,for i=1:100000,d{i}=i;end,toc
Elapsed time is 0.046177 seconds.
>> 

EDIT

As an alternative to fgetl, you could use TEXTSCAN

fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};

This reads the lines of test.txt as string into the cell array s in one go.


I tend to use urlread for this, e.g.:

filename = 'test.txt';
urlname = ['file:///' fullfile(pwd,filename)];
try
    str = urlread(urlname);
catch err
    disp(err.message)
end

The variable str then contains a big block of text of type string (ready for regexp to operate on).


Use the fgetl function instead of fread. For more info, go here


s = regexp(fileread('test.txt'), '(\r\n|\n|\r)', 'split');

The seashells example in Matlab's regexp documentation is directly on-point.


The following method is based on what Jonas proposed above, which I love very much. How ever, what we get is a cell array s. rather than a single String.

I found with one more line of codes, we can get a single string variable as below:

% original codes, thanks to Jonas

fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};

% the additional one line to turn s to a string
s = cell2mat(reshape(s, 1, []));

I found it useful to prepare text for jsondecode(text). :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜