File renaming; Can I get some feedback
Background: A friend of mine, who just might have some OCD issues, was telling me a story of how he was not looking forward to the hours of work he was about to invest into renaming tons of song files that had the words An, The, Of and many more capitalized.
Criteria: He gave me a list of words, omitted here because you will see them in the code, and told me that capitalization is O.K. if they are at the beginning of the song, but otherwise they must be lowercase.
Question 1: This is actually my first script and I am looking for some feedback. If there is a better way to write this, I would like to see it so I can improve my coding. The script is functional and does exactly what I would like it to do.
Question 2: Initially I did not have all 3 functions. I only had the function that replaced words. For some reason it would not work on files that looked like this "The Dark Side Of The Moon". When I ran the code, the "Of" would be replaced but neither of the "The"s would be. So, through trial and error I found that if I lowercase the first letter of the file, do my replace function and finally uppercase the file, it would work. Any clue as to why?
import os
words = ['A','An','The','Of','For','To','By','Or','Is','In','Out','If','Oh','And','On','At']
fileList = []
rootdir = ''
#Where are the files? Is the input a valid directory?
while True:
rootdir = raw_input('Where is your itunes library? ')
if os.path.isdir(rootdir): break
prin开发者_如何学编程t('That is not a valid directory. Try again.')
#Get a list of all the files in the directory/sub-directory's
for root, subFolders, files in os.walk(rootdir):
for file in files:
fileList.append(os.path.join(root))
#Create a function that replaces words.
def rename(a,b,c):
for file in os.listdir(c):
if file.find(a):
os.rename(file,file.replace(a,b))
#Create a function that changes the first letter in a filename to lowercase.
def renameL():
for file in os.listdir(os.getcwd()):
if file.find(' '):
os.rename(file,file.replace(file,file[0].lower()+file[1:]))
#Creat a function that changes the first letter in a filename to uppercase.
def renameU():
for file in os.listdir(os.getcwd()):
if file.find(' '):
os.rename(file,file.replace(file,file[0].upper()+file[1:]))
#Change directory/lowercase the first letter of the filename/replace the offending word/uppercase the first letter of the filename.
for x in fileList:
for y in words:
os.chdir(x)
renameL()
rename(y,y.lower(),x)
renameU()
Exit = raw_input('Press enter to exit.')
OK, some criticisms:
- Don't prompt for arguments, get them from the command line. It makes testing, scripting, and a lot of other things much easier.
- The implementation you've got wouldn't distinguish, e.g. "the" from "theater"
- You're using the current-working-directory to pass around the directory you're working on. Don't do this, just use a variable.
- Someone else said, "use
set
, it's faster". That advice is incorrect; the correct advice is "useset
, because you need a set". A set is a unordered collection of unique items (a list is an ordered collection of not-necessarily-unique items.) As a bonus for using the right collection, your program will probably run faster. - You need to properly split up the work you're trying to do. I'll explain:
Your program has two parts: 1. You need to loop through all the files in some directory and rename them according to some rule. 2. The Rule, given a string (yeah, it's going to be a file name, but forget about that), capitalize the first word and all of the subsequent words that aren't in some given set.
You've got (1) down pretty pat, so dig further into (2). The steps there are a. knock everything down to lower-case. b. Break the string into words. c. For each word, capitalize it if you're supposed to. d. Join the words back into a string.
Write (2) and write a test program that calls it to make sure it works properly:
assert capitalizeSongName('the Phantom Of tHe OPERA') == 'The Phantom of the Opera'
When you're happy with (2), write (1) and the whole thing should work.
Repeated code is usually considered bad style (DRY is the buzzword). Also I usually try not to interleave functionality.
For the "design" of this little script I would first walk the directories and create a large list of all audio files and directories. Then I write a function handling the changing of one the items in the list and create another list using map
. Now you have a current
and a want
list. Then I would zip
those lists together and rename them all.
If your Music Library is really huge, you can use itertools, so you don't have large lists in memory but iterators (only one item in memory at once). This is really easy in python: use imap
instead of map
and izip
instead of zip
.
To give you an impression and a few hints to useful functions, here is a rough sketch of how I would do it. (Warning: untested.)
import os
import sys
words = ['A','An','The','Of','For','To','By','Or','Is','In','Out','If','Oh','And','On','At']
wantWords = map(str.lower, words)
def main(args):
rootdir = args[1]
files = findFiles(rootdir)
wantFiles = map(cleanFilename, files)
rename(files, wantFiles)
def findFiles(rootdir):
result = []
for root, subFolders, files in os.walk(rootdir):
for filename in files:
result.append(os.path.join(root, filename))
return result
def cleanFilename(filename):
# do replacement magic
def rename(files, wantFiles):
for source, target in zip(files, wantFiles):
os.rename(source, target)
if __name__ == '__main__':
main(sys.argv)
The advantage is that you can see in the main()
what is happening without looking into the details of the functions. Each function does different stuff. On only walks the filesystem, on only changes one filename, one actually renames files.
- Use a
set
instead of a list. (It's faster) - I'm not really sure what you're trying to do there. The approach I took was just to lowercase the whole thing, then uppercase the first letter of each word as long as that word isn't in the set, and then uppercase the very first letter regardless (just in case it was one of those special words).
C# version I wrote a little while ago:
private static HashSet<string> _small = new HashSet<string>(new[] { "of", "the", "and", "on", "sur", "de", "des", "le", "la", "les", "par", "et", "en", "aux", "d", "l", "s" });
static string TitleCase(string str)
{
if (string.IsNullOrEmpty(str)) return string.Empty;
return string.Concat(char.ToUpper(str[0]),
Regex.Replace(str, @"\w+", m =>
{
string lower = m.Value.ToLower();
return _small.Contains(lower)
? lower
: string.Concat(char.ToUpper(lower[0]), lower.Substring(1));
})
.Substring(1));
}
I used a regex instead of splitting on spaces because I had a lot of french words in there that were separated by 's instead.
精彩评论