开发者

Blur user details in a screenshot of an email message

I would like to be able to come up with a way to au开发者_StackOverflow中文版tomatically blur user details in a screenshot of an email message that contains details such as username and password in plain text:

Blur user details in a screenshot of an email message

Image taken from plaintextoffenders.com, which I run.

The goal is making it easier to submit screenshot of such emails by having automatically (trying to) blur the username and password.

Should it be sufficient (for this particular case) to:

  1. Run the image through OCR, looking for the words "Username" and "Password"
  2. Selecting the text on the right of the OCR match
  3. Blur the selection

This is a naive approach, but should it be sufficient for this case? I realize the email format might be different, I'll deal with it when the time comes.

Any particular algorithms or implementations I should know when approaching this problem?

Thanks!


You will be faced with a couple of issues you need to think about the following:

  • Slang for the word Password
  • The translation for Password in all languages
  • Different cases in all languages
  • Is there a Environment.NewLine after Password?`A Colon? A Comma?

What I would do is to run some algorithm to find a specific text like in your case Password for just one translation and case, then I would blur out the next word ( you have to worry about different fonts and monospace etc as well here.. ).

BUT I wouldn't just "save" the image and let it be like that, present the "fixed" image to the user uploading it and let the user "move" the blur and resize it.

It's like the facial recognizion in Googles Picasa, it works great, but not all the time, and when it doesn't you are always presented with an alternative.

Have you looked at OCRTools? They got a free trial on their components and it seems promising.


In addition to the issues Filip mentioned, there may be an issue of accuracy. The open source OCR tools that I have tried have poor accuracy for screenshots, i.e. Ocrad.js and tessearct via node.js. However OCR on screenshots should be easier than scanned documents. I think the reason these don't work is due to mismatched training and test data, i.e. they are trained on pdf documents, not screenshots. So you may have to start by adding screenshots to the training set and retrain.

The online HTML5 based image redaction tool www.facepixelizer.com 2 has face detection and automatically pixelates faces, but it does not have OCR to blur out passwords or email addresses.

However, it's very quick work to redact a screenshot with facepixelizer. It has a specialized blurring tool that adjusts the blur to match the font size. [disclaimer: I created facepixelizer for my own needs of tutorial writing and blogging.]

Blur user details in a screenshot of an email message

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜