开发者

How to generate bivariate data of different shapes (e.g., square, circle, rectangle) with outliers?

I am currently looking for some tool that would generate datasets of different shapes like square, circle, rectangle, etc. with outliers for cluster analysis.

Can any开发者_开发技巧 one of you recommend a good dataset generator for cluster analysis? Is there anyway to generates such datasets in languages like R?


You should probably look into the mlbench package, especially synthetic dataset generating from mlbench.* functions, see some examples below.

How to generate bivariate data of different shapes (e.g., square, circle, rectangle) with outliers?

Other datasets or utility functions are probably best found on the Cluster Task View on CRAN. As @Roman said, adding outliers is not really difficult, especially when you work in only two dimensions.


I would create a shape and extract bounding coordinates. You can populate the shape with random points using splancs package.

Here's a small snippet from one of my programs:

# First we create a circle, into which uniform random points will be generated (kudos to Barry Rowlingson, r-sig-geo).
circle <-  function(x = x, y = y, r = radius, n = n.faces){
    t <- seq(from = 0, to = 2 * pi, length = n + 1)[-1]
    t <- cbind(x = x + r * sin(t), y = y+ r * cos(t))
    t <- rbind(t, t[1,])
    return(t)
}

csr(circle(0, 0, 100, 30), 1000)

How to generate bivariate data of different shapes (e.g., square, circle, rectangle) with outliers?

Feel free to add outliers. One way of going about this is sampling different shapes and joining them in different ways.


There is a flexible data generator in ELKI that can generate various distributions in arbitrary dimensionality. It also can generate Gamma distributed variables, for example.

There is documentation on the Wiki: http://elki.dbs.ifi.lmu.de/wiki/DataSetGenerator

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜