check if a pdf file is corrupted using C#
We have an application that generates pdf files, some times for some unknown reason, one of the pdf files gets corrupted, that is开发者_Go百科 it is created corrupted, we need to check if this pdf is corrupted or not before continuing to other pdfs, if it is corrupted we need to create it again.
Thanks
Look at PDF Parsers and try to use them to detect the corruption. For example, ghostscript.
Disclaimer: I work for Atalasoft
In DotImage Document Imaging, we include some PDF Parsing classes that will throw if the file is corrupt.
If you add our PDF Reader add-on, we will try to rasterize the PDF -- if it's corrupt, that will throw. If the problem is missing pieces, then you can look for them in the resulting image.
You can check Header PDF like this:
public bool IsPDFHeader(string fileName)
{
byte[] buffer = null;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
//buffer = br.ReadBytes((int)numBytes);
buffer = br.ReadBytes(5);
var enc = new ASCIIEncoding();
var header = enc.GetString(buffer);
//%PDF−1.0
// If you are loading it into a long, this is (0x04034b50).
if (buffer[0] == 0x25 && buffer[1] == 0x50
&& buffer[2] == 0x44 && buffer[3] == 0x46)
{
return header.StartsWith("%PDF-");
}
return false;
}
精彩评论