How to validate that an uploaded file is complete
I have an system with which users can upload a CSV file via an FTP server, or via a html form. On my end, a script polls the uploads directory and processes new files found. Some users will create the CSV by exporting it from Excel, while others will programmatically create it with scripts of their own.
My concern at the moment is: How can I be 100% certain that the file my processing script acts on is complete - in other words that it isn't a partial file (in progress, failed upload, etc)?
If the file format was something 开发者_如何学运维more structured, like XML, I'd be 100% confident that the file is complete by checking that the XML structure is valid (ie: closing tags).
Is there a good way to ensure that the uploaded CSV file is complete, without burdening & confusing less technical users who are simply uploading a file exported from a spreadsheet program (ie: providing an md5 of the file contents would be beyond them).
When designing CSV file formats in the past, I've always added a header and footer line as follows:
id,one,two,three,four,five,six
10,1,2,3,4,5,6
11,1,2,3,4,5,6
12,1,2,3,4,5,6
13,1,2,3,4,5,6
14,1,2,3,4,5,6
FOOTER,5
Most CSV file formats have a header to label the columns, the purpose of the footer is to indicate the file is completed. The footer contains a simple line count, which is easy to audit when looping through the file's contents. Not too complicated for users.
You could crosscheck whenever the filesize of the uploaded file matches the filesize of the original file.
精彩评论