开发者

python get diff from files

How to compare contents of two files saying if the content in file1 is present in file2 in some other format how can we detect it

Ex:file1

        import datetime,os
        #include<stdio.h>
        import java.io.*;
        import mymodule,urllib,
        #include<conio.h>

File2:

        #include<stdio.h>
        import java.io.*;
        import mymodule,logging,random,traceback,urllib,os
        #include<conio.h>

       f = open('workfile','r+')
       f.read()  

       f1 = open('workfile1','r+')
       f1.read()  

       if(contents of file present in file2):
       {
         开发者_StackOverflow中文版     print found
       }
       else:
        {
              print not found
        }    


You can have a look at difflib - http://docs.python.org/library/difflib.html

Snippet:

difflib.SequenceMatcher(None, file1.read(), file2.read())


This is nontrivial. I was going to whip up a little script just to match comments and imports, but even that requires a lot of "smart" regex parsing of data. Import statements can run on into several lines, and you'd have to split all of those lines on comma, map str.strip to the resulting lists, then compare the sets. That's not even counting things like 'from foo import bar' or 'import foo.bar'.

And the line 'import mymodule,urllib,' causes Python to croak. If you're going to be comparing things with a built-in knowledge of Python, they would have to be valid Python or results will be indeterminate.

Here's a proof-of-concept idea to just compare imports:

jcomeau@intrepid:/tmp$ cat t1.py
import sys, os, re
jcomeau@intrepid:/tmp$ cat t2.py
import os, sys, re, csv
jcomeau@intrepid:/tmp$ cat compare.py 
class t1:
 from t1 import *
class t2:
 from t2 import *
print 't1', dir(t1)
print 't2', dir(t2)
print set(dir(t1)) & set(dir(t2)) == set(dir(t1))
jcomeau@intrepid:/tmp$ python compare.py
compare.py:1: SyntaxWarning: import * only allowed at module level
  class t1:
compare.py:3: SyntaxWarning: import * only allowed at module level
  class t2:
t1 ['__doc__', '__module__', 'os', 're', 'sys']
t2 ['__doc__', '__module__', 'csv', 'os', 're', 'sys']
True
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜