Localization of numbers within a complex scene image
First of all, I very much appreciate the help provided by the experts here at SO. The questions posed by many and answered by the experts has been of immense benefit to me. It had helped me with a very crucial problem few months back when I was a student doing my thesis.
Right now I am working on a problem to detect (and then recognize) numbers in a complex scene image. You can check out these images here: http://imageshack.us/g/823/dsc1757w.jpg/. These are pictures of marathon runners with their numbers on the front of their shirts. I have to detect all the numbers that appear in the image and then recognize them. The recognition wont be difficult as these appear to be OCR friendly characters. The crucial thing is how to detect these numbers.
I had an idea to first color filter it for black color. But when开发者_JAVA百科 I tried in Matlab, the results were not encouraging, as we can see that many of the regions in the image qualify this criteria (the clothes, some shadows behind the runners, the shadows in the foliage, etc). Either I need to classify these characters from these other regions or need some other good technique. There are papers available and I have gone through some of them, like the SWT, DWT, etc., but I have a feeling they wont be of much help. I was thinking some kind of training algorithm might be useful. There is another reason for this, in future there might be other photos with possibly different fonts, etc., so I think a dedicated algorithmic approach might fail. Can anyone point me in the right direction?
I am not a novice in image processing, but not an expert either. So, any and all help/suggestion in this regard will be greatly appreciated :) .
Thanks, MD
You know that your problem is not a simple one, but it seems very interesting! Although I don't have any solutions for you, I will just share my thoughts in hope that you can make something out of it.
Let's take 2 of your photos as examples:
Photo-A: http://imageshack.us/photo/my-images/59/dsc0275a.jpg/ It shows a single person with a relative "big" green label with numbers in his shirt.
Photo-B: http://imageshack.us/photo/my-images/546/dsc0243u.jpg/ It shows a lot of people with red smaller labels in their shirts. (The labels' height in pixels is about 1/5 of the label in Photo-A)
Considering the above photos, I will try to write some random thoughts which may help...
(a) Define your scale: There is no point to apply a search algorithm to find labels from 2x2 pixels up-to the full image resolution. You must define the minimum/maximum limits for width & height of a label. Those limits may depend on many different factors:
(1) One factor is the real size of labels (defined by the distance of people from camera) which can be defined as a percentage of the image width & height.
(2) Another factor is the actual reading accurracy of the OCR you are going to use. If the numbers' image height is smaller than Y1 pixels or bigger than Y2 pixels the OCR will not be able to read it (it sounds strange but it's true: big images may seem very clear to the human eye, but an OCR may have problems reading it).
(b) Find the area(s) of interest: In your case, this is equivalent to "Find the approximate position of labels". We can define an athlete label roughly as "An (almost) rectangular area, which may be a bit inclined relative to photo borders, and contains: A central area of black + color C1 [e.g. red or green] + a white (=neutral) area on top and/or bottom of it".
A possible algorithm to find the approximate position of a label is:
(1) Traverse all image left-to-right, top-to-bottom and examine a square area of MinHeight/2 x MinHeight/2
(2) Create the histogram of the square area (or posterize it e.g. to 8 levels) and try to find if there is only Black + Another color C1 in a percentage of e.g. Black: 40% +/- 10, Color: 60% +/- 10%
(3) If (2) is true try to expand the area to Right and Bottom while the percentages are kept in the specified limits
(4) If the square is fully expanded, check if the expanded area size is inside the min/max limits of width/height you specified in (a). If not, go to step 1
(5) Process the expanded area to read the numbers - see (c) bellow
(6) Goto to step 1
(c) Process the area(s) of interest: Try the following steps:
(1) Convert each image-area to Grayscale by applying a color filter that burn Color C1 to white.
(2) Equalize the Grayscale to make the black letters stand-out
(3) If an inclination has been detected, perform a reverse rotation on the image-area to make the letters as horizontal as possible.
(4) Feed the area to an OCR trained only for numbers
Good luck with your project!
You could try to contact the author of this software:
Yaroslav is an active member of StackOverflow.
精彩评论