Validating data integrity after import from xls to R using RODBC
I am using RODBC for importing a xls file with ~10000 row and ~250 columns, using the script below:
channel <- odbcConnectExcel(xls.file="s:/demo.xls")
demo <- sqlFetch(channel,"Sheet_1")
odbcClose(channel)
During the import process, I have open and closed that xls file.
As a precaution, I checked the number of rows from R and from the xls file itself, just to make sure the import is not corrupted, and yup, the number of rows remain the same, so I assume the import process is OK.
Problem arise when I start tabulating the data, I realize that even the row numbers are the same, some rows are actually being corrupted, I have an UID
inside, and realize that at the later part of the dataframe, the UID are messed up (e.g. same row being duplicated twice).
I re-import the file again, dare not open or close the xls file, and this time, things are going well.
I have learned my lesson of never open or close the xls file when it is being read by RODBC, but how can I be sure that the data will be not messed up, or to ensure that I have really imported the dataset without error, after it is imported?
Thanks.
P.S. I am thinking of exporting the开发者_运维技巧 dataframe to csv, and the xls to csv also, then to compare the checksum, as an extra measure, but is there any other ways?
I use the read.xls
function from the gdata
package (install using `install.packages(gdata)').
This allows you to read from a non-opened Excel file if I'm correct. It is a wrapper for some Perl modules, so it might depend on whether you have those installed if it works for you.
Quoting the R manual for Data Export/Import:
Perl users have contributed a module OLE::SpreadSheet::ParseExcel and a program xls2csv.pl to convert Excel 95–2003 spreadsheets to CSV files. Package gdata provides a basic wrapper in its read.xls function. With suitable Perl modules installed this function can also read Excel 2007 spreadsheets.
Generally, I find the safest way to get data from Excel into R is via a csv export.
精彩评论