Blur user details in a screenshot of an email message

2023-03-18 05:07 问答作者：

I would like to be able to come up with a way to au开发者_StackOverflow中文版tomatically blur user details in a screenshot of an email message that contains details such as username and password in plain text:

Image taken from plaintextoffenders.com, which I run.

The goal is making it easier to submit screenshot of such emails by having automatically (trying to) blur the username and password.

Should it be sufficient (for this particular case) to:

Run the image through OCR, looking for the words "Username" and "Password"
Selecting the text on the right of the OCR match
Blur the selection

This is a naive approach, but should it be sufficient for this case? I realize the email format might be different, I'll deal with it when the time comes.

Any particular algorithms or implementations I should know when approaching this problem?

Thanks!

You will be faced with a couple of issues you need to think about the following:

Slang for the word Password
The translation for Password in all languages
Different cases in all languages
Is there a Environment.NewLine after Password?`A Colon? A Comma?

What I would do is to run some algorithm to find a specific text like in your case Password for just one translation and case, then I would blur out the next word ( you have to worry about different fonts and monospace etc as well here.. ).

BUT I wouldn't just "save" the image and let it be like that, present the "fixed" image to the user uploading it and let the user "move" the blur and resize it.

It's like the facial recognizion in Googles Picasa, it works great, but not all the time, and when it doesn't you are always presented with an alternative.

Have you looked at OCRTools? They got a free trial on their components and it seems promising.

In addition to the issues Filip mentioned, there may be an issue of accuracy. The open source OCR tools that I have tried have poor accuracy for screenshots, i.e. Ocrad.js and tessearct via node.js. However OCR on screenshots should be easier than scanned documents. I think the reason these don't work is due to mismatched training and test data, i.e. they are trained on pdf documents, not screenshots. So you may have to start by adding screenshots to the training set and retrain.

The online HTML5 based image redaction tool www.facepixelizer.com 2 has face detection and automatically pixelates faces, but it does not have OCR to blur out passwords or email addresses.

However, it's very quick work to redact a screenshot with facepixelizer. It has a specialized blurring tool that adjusts the blur to match the font size. [disclaimer: I created facepixelizer for my own needs of tutorial writing and blogging.]

Blur user details in a screenshot of an email message

继续阅读：.net image-manipulation image-processing ocr

Blur user details in a screenshot of an email message

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？