开发者

How to recognize this simple capcha?

alt text http://internationalpropertiesregistry.com/Server/showFile.php?file=%2FUpload%2F02468.gif3开发者_如何学编程58455ebc982cb93b98a258fc4d6ee60.gif

Is there a simple solution in MATLAB?


Easy answer: NOPE


For very simple: read the image as grayscale, threshold, clean up and run it through an ocr program.

%# read the image
img = imread('http://internationalpropertiesregistry.com/Server/showFile.php?file=%2FUpload%2F02468.gif358455ebc982cb93b98a258fc4d6ee60.gif');

%# threshold
bw = img < 150; 

%# clean up
bw = bwareaopen(bw,3,4);

%# look at it - the number 8 is not so pretty, the rest looks reasonable
figure,imshow(bw)

Then, figure out whether there is an OCR program that can help, such as this one

For even simpler:

%# read the image
[img,map] = imread('http://internationalpropertiesregistry.com/Server/showFile.php?file=%2FUpload%2F02468.gif358455ebc982cb93b98a258fc4d6ee60.gif');

%# display
figure,imshow(img,map)

%# and type out the numbers. It's going to be SO much less work than writing complicated code


For a simple one like that you can probably just run a median filter and ocr.

A median filter will, for every pixel in the image, look at the area around it, usually a 3x3 or 5x5 pixel area, determine the median pixel value in that area and set the pixel to that median value. In a block of the same colour nothing will happen, the whole area is the same colour as the pixel under consideration so the median i sthe same as the current value (or at least almost the same permitting slight colour variations. On the other hand noise pixels, i.e. a single pixel with a differently coloured area around it will simply disappear, since the median value of the area will be the colour of all the pixels around the noise pixel.

As for the ocr, or optical character recognition, I'd just use an existing program/library, it would certainly be possible to write an ocr algorithm in Matlab, but would be a much bigger exercise than writing a simple algorithm in an hour. You'd first need to read up on ocr techniques and algorithms.


  1. Clean up the image
  2. Separate each character into distinct images
  3. Compare each character against a set of reference characters
  4. The best matches are the most likely original character


You may consult this post. I succeeded in cracking a simpler CAPTCHA with the illustrated approach.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜