Compare CSV files with unique column/primary key
I need to compare two csv files, they both contain a unique ID column which needs to be used to compare the matching rows against each other.
Is there an开发者_JS百科ything out there before I burn the midnight oil to re-invent the wheel?
Thanks Ralph
I would recommend checking out BeyondCompare -- it's a comparison utility which does CSV comparisons very well. It shows the data in tabular format, you can specify the "key" columns, tell it which columns to ignore, etc...
There is a free version -- I'm not sure if the CSV comparison functionality comes with that, but it's worth checking out:
http://www.scootersoftware.com/
I don't work for them, I'm just a happy customer. :) John
If you're on Windows, one solution is to use the ISAM CSV driver in a program and you can then write a query across the tables.
Alternatively, load both into Excel, sort on the ID column and then do a compare on the workbooks.
I would use SQL Server. Use the import wizard and your CSV files as a flat file datasource. Once you have them imported into two tables in SQL Server you can analyze them.
SQL Server Express is free. Once you get the two tables created from your two CSV files use an INNER JOIN to join the two tables on the id you are looking for.
Guide to importing data into a SQL Server database http://www.gotknowhow.com/articles/how-to-import-delimited-text-files-sql-server-2005-database
If you can use MS Excel, then Query from Excel Files should help (you'll need to save your csv files into xls or xlsx format before):
- Define name for dataset in the first file (Formulas tab -> Define name)
- Define name for dataset in the second file
- Go to Data tab, select "From Other Sources", and from the dropdown, select "From Microsoft Query"
- Select your second file and confirm that you want to merge the columns manually
- In the following window "Query from Excel Files", drag&drop the unique_ID_column of first dataset into the unique_ID_column of second dataset - a link between these columns will be created
- Go to File menu, click "Return Data to MS Office Excel", an Import Data dialog will pop up
- Select the sheet into which you would like the matched data to be imported
- Click OK -> you should see matched data with columns from both csv files
Or if you don't mind uploading your files to an online service, you can use for example http://www.gridoc.com/join-tables and have the rows matched by creating a matching rule (Disclaimer: I am author of the tool).
Hope this helps.
精彩评论