Connected component analysis, how to handle split components?
I am developing an image recognition application. To recognize and classify symbols in an image the plan is to use the k-Nearest-Neighbours algorithm against a set of classified symbols for each connected开发者_运维问答 component (ie. "group of connected pixels").
But how do I handle symbols which are split? (if the symbols were characters an example would be "i")
Some heuristic:
- Compare not only connected components but also circumscribed rectangles with background pixels. These rectangles can be extended up (for cases "i", "ä" and so on).
- Define a metric taking into account the background pixels. For example add distance if pixels are the different, and substract if the same. In this case when you compare your extracted letter which similar "i" with "i" and "l" you will get shorter distance at "i". This is because white pixel between "dot" and "stick" will increase distance with "l".
- It happens that "rn" is recognized as "m". To avoid this a metric must be such than "r" is closer to "rn" than "m". For my text I had enough of the metric, as described in paragraph 2 (:
精彩评论