开发者

A Python script I've written for correcting table names of the SQL dumps from Windows. Any comments?

as a newbie in Python I've thought about writing a quick and dirty script for correcting the table anme caps of a MySQL dump file (by phpMyAdmin).

The idea is since the correct capitalization of the table names are in the comments, I'm going to use it.

e.g.:

-- --------------------------------------------------------

--
-- Table structure for table `Address`
--

The reason I'm asking here is that I don't have a mentor on Python programming and I was hoping you guys could steer me to the right direction. It feels like there's a lot of stuff I'm doing wrong (maybe it's not pythonic) I'd really appreciate your help, thanks in advance!

Here's what I've written (and it works):

#!/usr/bin/env python

import re

filename = 'dump.sql'

def get_text_blocks(filename):
    text_blocks = []
    text_block = ''
    separator = '-- -+'
    for line in open(filename, 'r'):
        text_block += line

        if re.match(separator, line):
            if text_block:
                text_blocks.append(text_block)
                text_block = ''
    return text_blocks

def fix_text_blocks(text_blocks):
    f = open(filename + '-fixed', 'w')
    for block in text_blocks:
        table_pattern = re.compile(r'Table structure for table `(.+)`')
        correct_table_name = table_pattern.search(block)
        if correct_table_name:
            replacement = 'CREATE TABLE IF NOT EXISTS `' + correct_table_name.groups(0)[0] + '`'
            block =  re.sub(r'CREATE TABLE IF NOT EXISTS `(.+)`',  replacement, block)
        f.write(bl开发者_开发知识库ock)           

if __name__ == '__main__':
    fix_text_blocks(get_text_blocks(filename))


Looks fairly good, so the following are relatively minor:

  • get_text_blocks basically splits the entire text by the separator, correct? If so, I think this can be done with a single regex with a re.MULTILINE flag. Something like r'(.*?)\n-- -+' (warning: untested).
  • If you don't want to use a single regex but prefer to parse the file in a loop, you can ditch the regex for str.straswith. You should also not concatenate strings the way you do with text_block, since every concatenation creates a new string. You can use either the StringIO class, or have a list of lines, and then join them with '\n'.join.
  • The nested 'if' can be dropped: use the 'and' operator instead.
  • In any case, working with files (and other objects which have a 'finally' logic) is now done with the 'with [object] as [name]:' clause. Look it up, it's nifty.
  • If you don't do that - always close your files when you finish working with them, preferably in a 'finally' clause.
  • I prefer opening files with the 'b' flag as well. Prevents '\r\n' magic in Windows.
  • In fix_text_blocks, the pattern should be compiled outside the for loop.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜