Finding the area of a 2-D data set
I have a .txt file with about 100,000 points in the 2-D plane. When I plot the points, there is a clearly defined 2-D region (think of a 2-D disc that has been morphed a bit).
What is the easiest way to compute the area of this region? Any way of doing easily in Matlab?
I made a polygonal approximation by finding a bunch (like 40) points on the boundary of the region and computing the area of the polygonal region in Matlab, but I was wondering if there is another开发者_开发问答, less tedious method than finding 40 points on the boundary.
Consider this example:
%# random points
x = randn(300,1);
y = randn(300,1);
%# convex hull
dt = DelaunayTri(x,y);
k = convexHull(dt);
%# area of convex hull
ar = polyarea(dt.X(k,1),dt.X(k,2))
%# plot
plot(dt.X(:,1), dt.X(:,2), '.'), hold on
fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha', 0.2);
hold off
title( sprintf('area = %g',ar) )
There is a short screencast By Doug Hull which solves this exact problem.
EDIT:
I am posting a second answer inspired by the solution proposed by @Jean-FrançoisCorbett.
First I create random data, and using the interactive brush tool, I remove some points to make it look like the desired "kidney" shape...
To have a baseline to compare against, we can manually trace the enclosing region using the IMFREEHAND function (I'm doing this using my laptop's touchpad, so not the most accurate drawing!). Then we find the area of this polygon using POLYAREA. Just like my previous answer, I compute the convex hull as well:
Now, and based on a previous SO question I had answered (2D histogram), the idea is to lay a grid over the data. The choice of the grid resolution is very important, mine was numBins = [20 30];
for the data used.
Next we count the number of squares containing enough points (I used at least 1
point as threshold, but you could try a higher value). Finally we multiply this count by the area of one grid square to obtain the approximated total area.
%### DATA ###
%# some random data
X = randn(100000,1)*1;
Y = randn(100000,1)*2;
%# HACK: remove some point to make data look like a kidney
idx = (X<-1 & -4<Y & Y<4 ); X(idx) = []; Y(idx) = [];
%# or use the brush tool
%#brush on
%### imfreehand ###
figure
line('XData',X, 'YData',Y, 'LineStyle','none', ...
'Color','b', 'Marker','.', 'MarkerSize',1);
daspect([1 1 1])
hROI = imfreehand('Closed',true);
pos = getPosition(hROI); %# pos = wait(hROI);
delete(hROI)
%# total area
ar1 = polyarea(pos(:,1), pos(:,2));
%# plot
hold on, plot(pos(:,1), pos(:,2), 'Color','m', 'LineWidth',2)
title('Freehand')
%### 2D histogram ###
%# center of bins
numBins = [20 30];
xbins = linspace(min(X), max(X), numBins(1));
ybins = linspace(min(Y), max(Y), numBins(2));
%# map X/Y values to bin-indices
Xi = round( interp1(xbins, 1:numBins(1), X, 'linear', 'extrap') );
Yi = round( interp1(ybins, 1:numBins(2), Y, 'linear', 'extrap') );
%# limit indices to the range [1,numBins]
Xi = max( min(Xi,numBins(1)), 1);
Yi = max( min(Yi,numBins(2)), 1);
%# count number of elements in each bin
H = accumarray([Yi(:), Xi(:)], 1, [numBins(2) numBins(1)]);
%# total area
THRESH = 0;
sqNum = sum(H(:)>THRESH);
sqArea = (xbins(2)-xbins(1)) * (ybins(2)-ybins(1));
ar2 = sqNum*sqArea;
%# plot 2D histogram/thresholded_histogram
figure, imagesc(xbins, ybins, H)
axis on, axis image, colormap hot; colorbar; %#caxis([0 500])
title( sprintf('2D Histogram, bins=[%d %d]',numBins) )
figure, imagesc(xbins, ybins, H>THRESH)
axis on, axis image, colormap gray
title( sprintf('H > %d',THRESH) )
%### convex hull ###
dt = DelaunayTri(X,Y);
k = convexHull(dt);
%# total area
ar3 = polyarea(dt.X(k,1), dt.X(k,2));
%# plot
figure, plot(X, Y, 'b.', 'MarkerSize',1), daspect([1 1 1])
hold on, fill(dt.X(k,1),dt.X(k,2), 'r', 'facealpha',0.2); hold off
title('Convex Hull')
%### plot ###
figure, hold on
%# plot histogram
imagesc(xbins, ybins, H>=1)
axis on, axis image, colormap gray
%# plot grid lines
xoff = diff(xbins(1:2))/2; yoff = diff(ybins(1:2))/2;
xv1 = repmat(xbins+xoff,[2 1]); xv1(end+1,:) = NaN;
yv1 = repmat([ybins(1)-yoff;ybins(end)+yoff;NaN],[1 size(xv1,2)]);
yv2 = repmat(ybins+yoff,[2 1]); yv2(end+1,:) = NaN;
xv2 = repmat([xbins(1)-xoff;xbins(end)+xoff;NaN],[1 size(yv2,2)]);
xgrid = [xv1(:);NaN;xv2(:)]; ygrid = [yv1(:);NaN;yv2(:)];
line(xgrid, ygrid, 'Color',[0.8 0.8 0.8], 'HandleVisibility','off')
%# plot points
h(1) = line('XData',X, 'YData',Y, 'LineStyle','none', ...
'Color','b', 'Marker','.', 'MarkerSize',1);
%# plot convex hull
h(2) = patch('XData',dt.X(k,1), 'YData',dt.X(k,2), ...
'LineWidth',2, 'LineStyle','-', ...
'EdgeColor','r', 'FaceColor','r', 'FaceAlpha',0.5);
%# plot freehand polygon
h(3) = plot(pos(:,1), pos(:,2), 'g-', 'LineWidth',2);
%# compare results
title(sprintf('area_{freehand} = %g, area_{grid} = %g, area_{convex} = %g', ...
ar1,ar2,ar3))
legend(h, {'Points' 'Convex Jull','FreeHand'})
hold off
Here is the final result of all three methods overlayed, with the area approximations displayed:
My answer is the simplest and perhaps the least elegant and precise. But first, a comment on previous answers:
Since your shape is usually kidney-shaped (not convex), calculating the area of its convex hull won't do, and an alternative is to determine its concave hull (see e.g. http://www.concavehull.com/home.php?main_menu=1) and calculate the area of that. But determining a concave hull is far more difficult than a convex hull. Plus, straggler points will cause trouble in both he convex and concave hull.
Delaunay triangulation followed by pruning, as suggested in @Ed Staub's answer, may a bit be more straightforward.
My own suggestion is this: How precise does your surface area calculation have to be? My guess is, not very. With either concave hull or pruned Delaunay triangulation, you'll have to make an arbitrary choice anyway as to where the "boundary" of your shape is (the edge isn't knife-sharp, and I see there are some straggler points sprinkled around it).
Therefore a simpler algorithm may be just as good for your application.
Divide your image in an orthogonal grid. Loop through all grid "pixels" or squares; if a given square contains at least one point (or perhaps two points?), mark the square as full, else empty. Finally, add the area of all full squares. Bingo.
The only parameter is the resolution length (size of the squares). Its value should be set to something similar to the pruning length in the case of Delaunay triangulation, i.e. "points within my shape are closer to each other than this length, and points further apart than this length should be ignored".
Perhaps an additional parameter is the number of points threshold for a square to be considered full. Maybe 2 would be good to ignore straggler points, but that may define the main shape a bit too tightly for your taste... Try both 1 and 2, and perhaps take an average of both. Or, use 1 and prune away the squares that have no neighbours (game-of-life-style). Simlarly, empty squares whose 8 neighbours are full should be considered full, to avoid holes in the middle of the shape.
There is no end to how much this algorithm can be refined, but due to the arbitrariness intrinsic to the problem definition in your particular application, any refinement is probably the algorithm equivalent of "polishing a turd".
I know next to nothing, so don't put much stock in this... consider doing a Delaunay triangulation. Then remove any hull (outer) edges longer than some maximum. Repeat until nothing to remove. Fill the remaining triangles.
This will orphan some outlier points.
I suggest using a space-filling-curve, for example a z-curve or better a moore curve. A sfc fills the full space and is good to index each points. For example for all f(x)=y you can sort the points of the curve in ascendending order and from that result you take as many points until you get a full roundtrip. These points you can then use to compute the area. Because you have many points maybe you want to use less points and use a cluster which make the result less accurate.
I think you can get the border points using convex hull algorithm with restriction to the edge length (you should sort points by vertical axis). Thus it will follow nonconvexity of your region. I propose length round 0.02. In any case you can experiment a bit with different lengths drawing the result and examining it visually.
精彩评论