How to design a unit test for generating a PDF document?
I'm late to the party with regards to using unit testing... trying to figure best practices and the such. My question is, given a class which is responsible for generating a PDF (or Doc/Html/Xml/etc.), how would I go about testing the final ou开发者_C百科tput file is correct? I figure a text based file (xml), I could just see if the strings match, but what about a binary file (pdf)? Should I just check against a MD5 hash? Should I even be testing this?
Thanks in advance.
I use pdfbox to extract text from generated PDF and check if it cointains the data it should. this doesnt check if data is in the correct place, but I dont go that deep with pdf testing. You need think how deep you want to go, the deeper you go the more time you will spend fixing the tests after a change(i never had a bug that text was in the wrong place and maybe thats why i dont test for it).
Another way would be to use the same PDF library (you use to write it) to read it or use someting like iText if you generate PDF from template using some framework.
For mission-critical PDFs (e.g. those sent out to a customer), I don't think checking the text is enough. You'd want to check layout, font-sizes, text-wrapping, etc. For the same reasons that we use Selenium to check web pages.
I took the approach of turning the PDF into an image, and comparing that image against a known "correct" image. Our PDFs didn't change very often, and didn't contain anything that changed over time (e.g. "today's" date). So this approach worked well - using the same input data, we could always generate the same output PDF.
I think PDFUnit now has built-in support for doing this, plus a lot more: http://www.pdfunit.com/en/documentation/java/testscope/rendered-pages.html
精彩评论