开发者

How do I sort files into multiple folders in MATLAB?

I am little stuck on a problem. I have tons of files generated daily and I need to sort them by开发者_高级运维 file name and date. I need to do this so my MATLAB script can read them. I currently do this manually, but was wondering if there is a easier way in MATLAB to sort and copy files.

My file names look like:

data1_2009_12_12_9.10
data1_2009_12_12_9.20
data1_2009_12_12_9.30
data1_2009_12_12_9.40
data2_2009_12_12_9.10
data2_2009_12_12_9.20
data2_2009_12_12_9.30
data2_2009_12_12_9.40
data3_2009_12_12_9.10
data3_2009_12_12_9.20
data3_2009_12_12_9.30
data3_2009_12_12_9.40
...

and tons of files like this.

Addition to above problem :

There has to be a easier way to stitch the files together. I mean copy file ' data1_2009_12_12_9.20' after file 'data1_2009_12_12_9.10' and so on ,... such that i am left with a huge txt file in end named data1_2009_12_12 ( or what ever ). containing all the data stitched together. Only way now i know to do is open all files with individual dlmread command in matlab and xls write one after another ( or more trivial way of copy paste manually )


Working in the field of functional imaging research, I've often had to sort large sets of files into a particular order for processing. Here's an example of how you can find files, parse the file names for certain identifier strings, and then sort the file names by a given criteria...

Collecting the files...

You can first get a list of all the file names from your directory using the DIR function:

dirData = dir('your_directory');      %# Get directory contents
dirData = dirData(~[dirData.isdir]);  %# Use only the file data
fileNames = {dirData.name};           %# Get file names

Parsing the file names with a regular expression...

Your file names appear to have the following format:

'data(an integer)_(a date)_(a time)'

so we can use REGEXP to parse the file names that match the above format and extract the integer following data, the three values for the date, and the two values for the time. The expression used for the matching will therefore capture 6 "tokens" per valid file name:

expr = '^data(\d+)\_(\d+)\_(\d+)\_(\d+)\_(\d+)\.(\d+)$';
fileData = regexp(fileNames,expr,'tokens');  %# Find tokens
index = ~cellfun('isempty',fileData);        %# Find index of matches
fileData = [fileData{index}];                %# Remove non-matches
fileData = vertcat(fileData{:});             %# Format token data
fileNames = fileNames(index);                %# Remove non-matching file names

Sorting based on the tokens...

You can convert the above string tokens to numbers (using the STR2DOUBLE function) and then convert the date and time values to a date number (using the function DATENUM):

nFiles = size(fileData,1);              %# Number of files matching format
fileData = str2double(fileData);        %# Convert from strings to numbers
fileData = [fileData zeros(nFiles,1)];  %# Add a zero column (for the seconds)
fileData = [fileData(:,1) datenum(fileData(:,2:end))];  %# Format dates

The variable fileData will now be an nFiles-by-2 matrix of numeric values. You can sort these values using the function SORTROWS. The following code will sort first by the integer following the word data and next by the date number:

[fileData,index] = sortrows(fileData,1:2);  %# Sort numeric values
fileNames = fileNames(index);               %# Apply sort to file names

Concatenating the files...

The fileNames variable now contains a cell array of all the files in the given directory that match the desired file name format, sorted first by the integer following the word data and then by the date. If you now want to concatenate all of these files into one large file, you could try using the SYSTEM function to call a system command to do this for you. If you are using a Windows machine, you can do something like what I describe in this answer to another SO question where I show how you can use the DOS for command to concatenate text files. You can try something like the following:

inFiles = strcat({'"'},fileNames,{'", '});  %# Add quotes, commas, and spaces
inFiles = [inFiles{:}];                     %# Create a single string
inFiles = inFiles(1:end-2);                 %# Remove last comma and space
outFile = 'total_data.txt';                 %# Output file name
system(['for %f in (' inFiles ') do type "%f" >> "' outFile '"']);

This should create a single file total_data.txt containing all of the data from the individual files concatenated in the order that their names appear in the variable fileNames. Keep in mind that each file will probably have to end with a new line character to get things to concatenate correctly.


An alternative to what @gnovice suggested is to loop over the file names and use sscanf() to recover the different sections in the filenames you are interested in:

n = sscanf(filename, 'data%d_%d_%d_%d_%d.%d')
n(1)    %# data number
n(2)    %# the year
...

Example:

files = dir('data*');                 %# list all entries beginning with 'data'
parts = zeros(length(files), 6);      %# read all the 6 parts into this matrix
for i=1:length(files)
    parts(i,:) = sscanf(files(i).name, 'data%d_%d_%d_%d_%d.%d')';  %'#transposed
end

[parts idx] = sortrows(parts, [6 1]); %# sort by one/multiple columns of choice
files = files(idx);                   %# apply the new order to the files struct

EDIT:

I just saw your edit about merging those files. That can be done easily from the shell. For example lets create one big file for all data from the year 2009 (assuming it makes sense to stack files on top of each other):

on Windows:

type data*_2009_* > 2009.backup

on Unix:

cat data*_2009_* > 2009.backup


In Matlab the function call

files = dir('.');

returns a structure (called files) with fields

name

date

bytes

isdir

datenum

You can use your usual Matlab techniques for manipulating files.names.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜