开发者

Why do values not appear in ecdf plot?

I am trying to plot the ccdf of the data given below but for some reason, it doesn't look right. I was cross checking with some data points (2523, 313, 224) but they are not visible. Am I doing something wrong?

R Script:

# Y defined below
Y.ecdf = ecdf(Y)
curve((length((Y))*(1-Y.ecdf(x))), n = 10000, 
       from = 0, to = 100, xlab = "# of items", 
       ylab = "# instances", col=colors[1], lty=1, lwd=4)

Why do values not appear in ecdf plot?

Y = c( 3, 1, 4, 11, 2, 2, 9, 7, 22, 3, 1, 1, 7, 2, 2, 2, 4, 2, 1, 1, 6, 3, 20,
15, 4, 1, 1, 5, 3, 10, 16, 224, 74, 2, 1, 2, 2, 3, 3, 7, 2, 2, 1, 4, 2, 9,
3, 3, 2, 1, 1, 3, 2, 4, 4, 1, 7, 2, 1, 2, 1, 1, 2, 4, 3, 1, 1, 1, 3, 4, 2,
2, 1, 1, 5, 6, 13, 15, 3, 1, 2, 5, 1, 1, 1, 1, 2, 6, 1, 4, 1, 3, 1, 1, 4,
2, 2, 3, 3, 1, 4, 2, 1, 4, 6, 1, 1, 1, 1, 2, 5, 2, 1, 1, 1, 1, 1, 3, 1, 3,
2, 1, 1, 1, 2, 1, 8, 2, 3, 1, 1, 1, 1, 1, 3, 1, 3, 2, 1, 2, 1, 1, 5, 1, 1,
4, 3, 3, 1, 1, 1, 3, 4, 4, 3, 2, 2, 4, 3, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3,
2, 3, 9, 3, 4, 2, 1, 1, 1, 3, 22, 5, 13, 1, 1, 1, 1, 1, 4, 1, 1, 31, 1, 1,
2, 1, 1, 1, 3, 4, 4, 8, 6, 6, 7, 2, 1, 2, 2, 5, 1, 2, 6, 6, 1, 3, 1, 5, 2,
1, 5, 3, 1, 2, 2, 1开发者_JAVA技巧, 2, 1, 2, 2, 1, 2, 1, 1, 4, 1, 3, 2, 1, 4, 1, 212, 2,
7, 7, 10, 2, 4, 2, 1, 1, 1, 2, 3, 2, 1, 16, 6, 2, 10, 2, 1, 1, 15, 1, 3, 8,
1, 1, 3, 1, 1, 2, 1, 1, 4, 2, 3, 1, 1, 1, 1, 5, 9, 4, 1, 1, 2, 5, 1, 4, 9,
6, 19, 1, 1, 1, 2, 10, 6, 9, 5, 11, 6, 8, 1, 1, 1, 1, 1, 313, 3, 1, 3, 1,
2, 2, 2, 3, 4, 5, 1, 1, 3, 1, 1, 5, 4, 2, 5, 1, 20, 4, 1, 2, 1, 1, 1, 2, 5,
4, 2, 3, 1, 3, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 3, 3, 1, 1, 1, 8, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 4, 13, 1, 2, 1, 2, 3, 3, 1, 2, 2, 1, 3, 4, 1, 1, 1, 1, 2,
2, 4, 5, 3, 2, 2, 2, 1, 1, 3, 2523, 7, 4, 2, 4, 11, 8, 1, 4, 4, 2, 5, 3, 3,
1, 3, 1, 3, 4, 1, 1, 1, 1, 6, 6, 2, 2, 1, 8, 8, 3, 3, 4, 5, 2, 2, 2, 3, 2,
6, 2, 2, 2, 1, 5, 5, 4, 3, 1, 2, 2, 6, 3, 2, 2, 2, 10, 9, 1, 2, 1, 1, 1, 2,
2, 3, 1, 3, 1, 9, 1, 1, 1, 2, 1, 96, 2, 2, 5, 1, 1, 1, 2, 2, 1, 1, 1, 5, 2,
1, 1, 1, 2, 1, 1, 4, 2, 10, 3, 2, 2, 8, 8, 2, 1, 2, 4, 1, 1, 13, 20, 3, 2,
5, 9, 1, 22, 25, 4, 1, 1, 3, 2, 1, 1, 7, 9, 5, 9, 1, 3, 1, 8, 2, 2, 1, 3,
1, 2, 6, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 16, 3, 5, 2)


Expanding on our discussion in the comments...

An empirical cumulative distribution function is a plot of X (x axis) vs. Pr(X < x) (y axis). So for your example it would look something like this:

plot(Y.ecdf,do.points = FALSE,
     verticals = TRUE,col = "blue",
     xlab = "x", ylab = "Pr(X < x)")

Why do values not appear in ecdf plot?

If you look very closely you can see where the line goes up when you reach your very large values, but it's hard to make out since so many of your values are less than 10.

What you've done is to invert this function so that you're looking at the opposite tail of the distribution, i.e. Pr(X > x). You've also scaled the probabilities on the y axis. I'm not sure why, but whatever. It might make sense given your particular task. So you're doing something like this (but with the y axis scaling):

curve((1-Y.ecdf(x)), n = 10000, 
       from = 0, to = 2600, ylab = "Pr(X > x)", 
       xlab = "x", col="blue", lty=1, lwd=2)

Why do values not appear in ecdf plot?

but you originally had the from and to arguments set to only plot the function from 0 to 100. If you wanted to "zoom in" on your outliers, you could just change the from and to values to something more relevant:

curve((1-Y.ecdf(x)), n = 10000, 
       from = 250, to = 2600, ylab = "Pr(X > x)", 
       xlab = "x", col="blue", lty=1, lwd=2)

Why do values not appear in ecdf plot?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜