Classifying type samples from image files
Which approach would you suggest for automatically classifying type found in images? The samples are likely large, with black text on a white background.
The categories are defined here, with some examples on each (Google Books link): http://bit.ly/9Mnu7P This is an extended version of the VOX-ATypI classification system.
My initial thoughts on this were to train the system with lots of single character samples from each category, but I'm wondering i开发者_StackOverflowf there's a better way that would eliminate the need to do the comparison one letter at a time.
First, you need to extract features for classification. Typefaces are generally distinguished by the thickness of lines, the presence of serifs, "circularity" of character parts. Thus, the possible features are:
- The fraction of the number of black pixels on the fixed area.
- Try to apply math morphology erosion few times (and/or use different masks) and compute this fraction
- Compute the mean compactness of a character: perimeter^2 / area
- After applying erosion, count the number of connected components for a character
- Compute the elongation and other image moments, also the direction
- etc
I see two options here: either compute mean features for all characters, or try to classify letters first, and than classify the font based on some specific letters (so, you train the different classifier for a different letter). It's hard to say which one is better in your case.
As for specific learning algorithm, Random Forest seems to be a good place to start. There's an implementation in the OpenCV library.
精彩评论