开发者

Remove shapes from image with X number of pixels or less

If I have a image with, let's say squares. Is it possible to remove all shapes formed by 10 (non white) pixels or less and keep all shapes that is formed by 11 pixels or more? I want to do it programmatically or with a command line.

Thanks in advan开发者_Python百科ce!


Possibly an algorithm called Erosion may be useful. It works on boolean images, shrinking all areas of "true" removing one layer of their surface pixels. Apply a few times, and small areas disappear, bigger ones remain (though shrunken). De-shrink the survivors with the opposite algorithm, dilation (apply erosion to the logical complement of the image). Find a way to define a boolean images by testing if a pixel is inside an "object" however you define it, and find a way to apply the results to the original image to change the unwanted small objects to the background color.

To be more specific would require seeing examples.


Look up flood fill algorithms and alter them to count the pixels instead of filling. Then if the shape is small enough, fill it with white.


There are a couple of ways to approach this. What you are referring to is commonly called Despeckle in Document Imaging Applications. Document scanners often introduce a lot of dirt and noise into an image during scanning and so this must be removed removed to help improve OCR accuracy.

I assume you are processing B/W images here or can convert your image to B/W otherwise it becomes a lot more complex. Despeckle is done by analysing all the blobs on the page. Another way to decide on blob size is to decide on width, height and number of pixels combined.

Leptonica.com - Is an Open Source C based library that has the blob analysis functions you require. With some simple check and loops you can delete these smaller objects. Leptonica can also be compiled quite easily into a command line program. There are many example programs and that is the best way to learn Leptionica.

For testing, you may want to try ImageMagick. It has a command line option for despeckle but it has no further parameters. http://www.imagemagick.org/script/command-line-options.php#despeckle

The other option is to look for "despeckle" algorithms in Google.


ImageMagick, starting from version 6.8.9-10, includes a -connected-components option which can be used to do what you want, however from the example provided in the official website, it is not immediately obvious how to actually obtain the original image minus the removed connected components.

I'm almost sure there is a simpler way, but I did it via a clunky script performing a series of steps:

  • First, I ran the command from the connected components example:

    convert in.png \
      -define connected-components:verbose=true \
      -connected-components 8 out.png
    
  • This produces output in the following format:

    Objects (id: bounding-box centroid area mean-color):
    (...)
    181: 9x9+1601+916 1605.2,920.2 44 gray(0)
    185: 5x5+1266+923 1268.0,925.0 13 gray(0)
    274: 5x5+2276+1661 2278.0,1663.0 13 gray(255)
    
  • Then, I used awk to filter only the lines containing an area (in pixels) of black components (mean-color gray(0) in my image) smaller than my threshold $min_cc_area. Note that connected-components has an option to filter components smaller than a given area, but I needed the opposite. The awk line is similar to the following:

    {if ($4 < $min_cc_area && $5=="gray(0)") { print $2 }}
    
  • I then proceeded to create a command-line for ImageMagick where I drew white rectangles on top of these connected components. The -draw command expects coordinates in the form x1,y1 x2,y2, so I used awk again to compute the coordinates from the ones in the format [w]x[h]+x1+y1 given by -connected-components:

    awk '{print "white fill rectangle " $3 "," $4 " " $3+$1-1 "," $4+$2-1 }'
    
  • Finally, I ran the created ImageMagick command-line to create a new image combining all the white rectangles on top of the original one.

In the end, I got the following script:

# usage: $0 infile min_cc_area outfile
infile=$1
min_cc_area=$2
outfile=$3
awk_exp="{if (\$4 < $min_cc_area && \$5==\"gray(0)\") { print \$2 }}"

draw_rects=""
draw_rects+=$(convert $infile -define connected-components:verbose=true \
  -connected-components 8 null: | \
  awk "$awk_exp" | tr 'x+' '  ' | \
  awk '{print " rectangle " $3 "," $4 " " $3+$1-1 "," $4+$2-1 }')

convert $infile -draw "fill white $draw_rects" $outfile

Note that this solution may erase black pixels near the removed CC's, if they insersect the bounding rectangle of the removed component.


You want a connected components labeling algorithm. It will scan through the image and give every connected shape an id number, as well as assign every pixel an id number of what shape it belongs to.

After running a connected components filter, just count the pixels assigned to each object, find the objects that have less than 10 pixels, and replace the pixels in those objects with white.


If you can use openCV, this piece of code does what you want (i.e., despakle). You can play w/ parameters of Size(3,3) in the first line to get rid of bigger or smaller noisy artifacts.

Mat element = getStructuringElement(MORPH_ELLIPSE, Size(3,3));
morphologyEx(image, image, MORPH_OPEN, element);
morphologyEx(image, image, MORPH_CLOSE, element);


You just want to figure out the area of each components. So an 8-direction tracking algorithm could help. I have an API solve this problem coded in C++. If you want, send me an email.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜