A Python script I've written for correcting table names of the SQL dumps from Windows. Any comments?
as a newbie in Python I've thought about writing a quick and dirty script for correcting the table anme caps of a MySQL dump file (by phpMyAdmin).
The idea is since the correct capitalization of the table names are in the comments, I'm going to use it.
e.g.:
-- --------------------------------------------------------
--
-- Table structure for table `Address`
--
The reason I'm asking here is that I don't have a mentor on Python programming and I was hoping you guys could steer me to the right direction. It feels like there's a lot of stuff I'm doing wrong (maybe it's not pythonic) I'd really appreciate your help, thanks in advance!
Here's what I've written (and it works):
#!/usr/bin/env python
import re
filename = 'dump.sql'
def get_text_blocks(filename):
    text_blocks = []
    text_block = ''
    separator = '-- -+'
    for line in open(filename, 'r'):
        text_block += line
        if re.match(separator, line):
            if text_block:
                text_blocks.append(text_block)
                text_block = ''
    return text_blocks
def fix_text_blocks(text_blocks):
    f = open(filename + '-fixed', 'w')
    for block in text_blocks:
        table_pattern = re.compile(r'Table structure for table `(.+)`')
        correct_table_name = table_pattern.search(block)
        if correct_table_name:
            replacement = 'CREATE TABLE IF NOT EXISTS `' + correct_table_name.groups(0)[0] + '`'
            block =  re.sub(r'CREATE TABLE IF NOT EXISTS `(.+)`',  replacement, block)
        f.write(bl开发者_开发知识库ock)           
if __name__ == '__main__':
    fix_text_blocks(get_text_blocks(filename))
Looks fairly good, so the following are relatively minor:
- get_text_blocks basically splits the entire text by the separator, correct? If so, I think this can be done with a single regex with a re.MULTILINE flag. Something like r'(.*?)\n-- -+' (warning: untested).
- If you don't want to use a single regex but prefer to parse the file in a loop, you can ditch the regex for str.straswith. You should also not concatenate strings the way you do with text_block, since every concatenation creates a new string. You can use either the StringIO class, or have a list of lines, and then join them with '\n'.join.
- The nested 'if' can be dropped: use the 'and' operator instead.
- In any case, working with files (and other objects which have a 'finally' logic) is now done with the 'with [object] as [name]:' clause. Look it up, it's nifty.
- If you don't do that - always close your files when you finish working with them, preferably in a 'finally' clause.
- I prefer opening files with the 'b' flag as well. Prevents '\r\n' magic in Windows.
- In fix_text_blocks, the pattern should be compiled outside the for loop.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论