To compare 2 excel files using python
I have a excel file with following fields
Software_name , Version and Count.
This file is a inventory of all softwares installed in the network of an organization which was generated using LANdesk.
I have another excel file which is an purchase inventory of these softwares which is generated manually.
I need to compare these sheets and create a report stating whether the organization is compliant or not.
Hence how do i compare these two files.
there are instances like Microsoft Office is mentioned as just office and 'server' is spelt as 'svr'
How 开发者_JS百科do go about with it?
The first step as SeyZ mentions is to determine how you want to read these Excel files. I don't have experience with the libraries he refers to. Instead I use COM programming to read and write Excel files, which of course requires that you have Excel installed. This capability comes from PyWin32 which is installed by default if you use the ActiveState Python installer, or can be installed separately if you got Python from Python.org.
The next step would be to convert things into a common format for comparing, or searching for elements from one file within the other. My first thought here would be to load the contents of the LANdesk software inventory into a database table using something quick and easy like SQLite.
Then for each item of the manual purchase list, normalize the product name and search for it within the inventory table.
Normalizing the values would be a process of splitting a name into parts and replacing partial words and phrases with their full versions. For example, you could create a lookup table of conversions:
partial full
-------------------- --------------------
svr server
srv server
SRV Stevie Ray Vaughan
office Microsoft Office
etc et cetera
You would want to run your manual list data through the normalizing process and add partial values and their full version to this table until it handles all of the cases that you need. Then run the comparison. Here is some Pythonish pseudocode:
for each row of manual inventory excel worksheet:
product = sh.Cells(row, 1) # get contents of row n, column 1
# adjust based on the structure of this sheet
parts = product.split(" ") # split on spaces into a list
for n, part in enumerate(parts):
parts[n] = Normalize(part) # look up part in conversion table
normalProduct = " ".join(parts)
if LookupProduct(normalProduct): # look up normalized name in LANdesk list
add to compliant list
else:
add to non-compliant list
if len(non-compliant list) > 0:
TimeForShopping(non-compliant list)
If you have experience with using SQLite or any other database with Python, then creating the LANdesk product table, and the normalize and lookup routines should be fairly straightforward, but if not then more pseudocode and examples would be in order. Let me know if you need those.
There are several libraries to manipulate .xls files.
XLRD allows you to extract data from Excel spreadsheet files. So you can compare two files easily. (read)
XLWT allows you to create some Excel files. (write)
XLUtils requires both of the xlrd and xlwt packages. So, you can read & write easily thanks to this library.
精彩评论