Verifying a CSV file is really a CSV file
I want to make sure a CSV file uploaded by one of our clients is really a CSV file in PHP. I'm handling the upload itself just fine. I'm not worried about malicious users, but I am worried about the ones that will try to upload Excel workbooks instead. Unless I'm mistaken, an Excel workbook and a CSV can still have the same MIME, so checking that isn't good enough.
Is there one regular expression that can handle verifying a CSV file is really a CSV file? (I don't need parsing... that's what PHP's fgetcsv() is for.) I've seen several, but they are usually followed by comments like "it didn't work for case X."
开发者_如何学运维Is there some other better way of handling this?
(I expect the CSV to hold first/last names, department names... nothing fancy.)
Unlike other file formats, CSV has no tell-tale bytes in the file header. It starts straight away with the actual data.
I don't see any way except to actually parse it, and to count whether there is the expected number of columns in the result.
It may be enough to read as many characters as are needed to determine the first line (= until the first line break).
You can write a RE that will give you a guess if the file is valid CSV or not - but perhaps a better approach would be to try and parse the file as if it was CSV (with your fgetcsv() call), and assume it's NOT a valid one if the call fails?
In other words, the best way to see if the file is a valid CSV file is to try and parse it as such, and assume that if you failed to parse, it wasn't a CSV!
The easiest way is to try parsing the CSV and attempting to read value from it. Parse it using str_getcsv
and then attempt to read a value from it. If you are able to read and validate at least a couple of values, then the CSV is valid.
EDIT
If you don't have access to str_getcsv
, use this, a drop-in replacement for str_getcsv
from http://www.electrictoolbox.com/php-str-getcsv-function/:
if (!function_exists('str_getcsv')) {
function str_getcsv($input, $delimiter = ",", $enclosure = '"', $escape = "\\") {
$fp = fopen("php://memory", 'r+');
fputs($fp, $input);
rewind($fp);
$data = fgetcsv($fp, null, $delimiter, $enclosure); // $escape only got added in 5.3.0
fclose($fp);
return $data;
}
}
Technically speaking, almost any text file could be a CSV file (barring quotes that don't match, etc.). You can try to guess if it's a binary file, but there isn't a reliable way to do that unless your data only has ASCII or something of the sort. If all you care is that people don't upload Excel files by mistake, check the file extension.
Any text file is a valid CSV file so it is impossible to come up with a standard way of verifying its correctness because it depends on what you really expect it to be.
Before you even start, you have to know what delimiter is used in that CSV file. After that, the easiest way to verify is to use fgetcsv function. For example:
<?php
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data); // Number of fields in a row.
if ($num !== 5)
{
// OMG! Column count is not five!
}
else if (intval($data[$c]) == 0)
{
// OMG! Customer thinks we sold a car for $0!
}
}
fclose($handle);
}
?>
精彩评论