Any way to create a PDF so the text can't be copied/extracted back out?
I'm trying to help create a neighborhood directory and I want to discourage someone from harvesting contact info (especially email addresses) from that.
Is there any easy way to prevent someone from copying and pasting that text from the PDF?
Update Goal here is to make the PDF no easier to harvest email addresses from than the current paper directory, and to make the PDF directory as useful as the paper directory. The online pdf directory will have advantages such as always being up to date and saving some printing costs (or passing those开发者_高级运维 costs on to folks who want to print the document).
If the data is to be readable, which I'd assume is your goal, there is no way you can stop a dedicated person from taking it and using it. Converting to an image will make it difficult, but anyone with good OCR or a team of cheap foreign labor can get anything they want out of it. If the data is super sensitive and you are worried about it, you should really reconsider the value of publishing it.
Using an image instead of text makes it a lot more difficult to automatically grab data from a PDF.
Part of one of my previous jobs included reformatting data in PDFs to a (specific) more structured document format, and when we got PDFs whose text was images -- let alone blurry or hard to read images -- the OCR would be riddled with wrong letters, and we'd have to go in by hand and fix most everything.
The other answers are a good start. However, I found out exactly how to lock the PDF to prevent copying.
You can use Primo PDF's free pdf driver and change the Security settings per: http://www.primopdf.com/help/tip_secure_pdf.aspx
To add password security to your PDF, read on to learn how you can do it free with PrimoPDF.
- Download and install the free PDF driver: http://www.primopdf.com/download.aspx
- Open the file to convert to PDF
- Open the Print dialog (or press Ctrl+P)
- In the printer list, choose PrimoPDF
- Click Print
- On the PrimoPDF dialog, click the Change button next to the Security label to open the security dialog.
- Enter your Open password twice.
- Optionally, enter a Permissions password and choose the functionality you want to restrict.
- Click OK.
- Click Create PDF.
Final Tip. If you want to apply security to all the PDF files you create, you can do it easily by correctly configuring PrimoPDF. At the bottom of the dialog (see above), just make sure the Always use these settings option is turned on.
PDF allows for locking the document (source text will be encrypted, but readable), so the properties won't allow reader to print or copy from it.
Anyway, I would discourage this use as it is pain in the ass to use such PDF. Personally, I would recommend you to look for other methods than actively making your document readers angry.
PS: Harvesting emails from PDF is virtually unheard of.
Another possible solutions could be the following:
- Convert text to vectors (some open source tools can do this) so the PDF file will still maintain small size comparing to having images inside pdf.
- Hack the PDF to damage internal font indexes to unicode symbols map so the copied text will be copied as the rubbish (as pdf reader app will not be able to find proper mapping from images to their character values).
Disclaimer: I work for ByteScout, the vendor of PDF Extractor SDK tool that can be used to restore the text from all possible damages from PDF files like these so actually if someone really wants to restore text from pdf then it can be done anyway (with less or more errors though).
精彩评论