Parse git - log file with python
So i need to parse thing like this :
commit e397a6e988c05d6fd87ae904303ec0e17f4d79a2
Author: Name <email@email.com>
Date: Sat Jul 9 21:29:10 2011 +0400
commit message
1 files changed, 21 insertions(+), 11 deletions(-)
and get Author name and number of insertions and deletions.
For the name i have this:
re.findall(r"Author: (.+) <",gitLog)
For the numbers i have this:
re.findall(r" (\d+) insertions\S+, (\d+) deletions",gitLog)
But i want to get a list of tuples of name,insertions and delitions with one regular-expression.
I tryed to do somthing like
re.findall(r"Author: (.+) <.+ (\d+) insertions\S+, (\d+) deletions",gitLog,re.DOTALL)
but it returns nothing...
So what is my mistake? How regular-expression should look like?
UPADTE: wRAR is right, but somehow when i read i file and try to parse it i get the whole file as a name , and then last insertion and deletion, so it matches the whole file but not a single commit... [.+] ge开发者_JAVA技巧ts the whole file but not a part of a commit...
If you have access to the repo and not some text dump of git log
, save yourself the parsing trouble and generate different log output:
git log --pretty="%an" --numstat
Will produce output of the form:
Author Name
lines_inserted lines_deleted modified_file
Which you don't even need regex for. If you want to keep with regex, you need to match the (+)
after insertions or else it will not match at all and not capture the numbers.
You should use (directly or by borrowing the code) existing packages such as GitPython, but about your regex question, the provided regex for the provided text returns [('Name', '21', '11')]
so I suppose it is right.
There is a module that I used for parsing Git log with Python. Looks quite living:
https://github.com/gaborantal/git-log-parser
So the answer to my question is :
re.findall(r"Author: (\S+) <.+\n.+\n\n.+\n\n.+ (\d+) insertions\S+, (\d+) deletions",gitLog)
But thanks for you answers anyway.
精彩评论