finding a number in filename using regex
I don't have much experience with regexes and I wanted to rectify that. I decided to build 开发者_如何转开发an application that takes a directory name, scans all files (that all have a increasing serial number but differ subtly in their filenames. Example : episode01.mp4
, episode_02.mp4
, episod03.mp4
, episode04.rmvb
etc.)
The application should scan the directory, find the number in each file name and rename the file along wit the extension to a common format (episode01.mp4
,episode02.mp4
,episode03.mp4
,episode04.rmvb
etc.).
I have the following code:
Dictionary<string, string> renameDictionary = new Dictionary<string,string>();
DirectoryInfo dInfo = new DirectoryInfo(path);
string newFormat = "Episode{0}.{1}";
Regex regex = new Regex(@".*?(?<no>\d+).*?\.(?<ext>.*)"); //look for a number(before .) aext: *(d+)*.*
foreach (var file in dInfo.GetFiles())
{
string fileName = file.Name;
var match = regex.Match(fileName);
if (match != null)
{
GroupCollection gc = match.Groups;
//Console.WriteLine("Number : {0}, Extension : {2} found in {1}.", gc["no"], fileName,gc["ext"]);
renameDictionary[fileName] = string.Format(newFormat, gc["no"], gc["ext"]);
}
}
foreach (var renamePair in renameDictionary)
{
Console.WriteLine("{0} will be renamed to {1}.", renamePair.Key, renamePair.Value);
//stuff for renaming here
}
One problem in this code is that it also includes files which don't have numbers in the renameDictionary
. It would also be helpful if you could point out any other gotchas that I should be careful about.
PS: I am assuming that the filenames will only contain numbers corresponding to serial (nothing like cam7_0001.jpg
)
This simplest solution is probably to use Path.GetFileNameWithoutExtension
to get the file name, and then the regex \d+$
to get the number at its end (or Path.GetExtension
and \d+
to get the number anywhere).
You can also achieve this in a single replace:
Regex.Replace(fileName, @".*?(\d+).*(\.[^.]+)$", "Episode$1$2")
This regex is a bit better, in that it forces the extension not to contain dots.
精彩评论