Ensure minimum distance between adjacent points
I have a list/frame of 15-25 datapoints. They're all between 0 and 100, and there are some clusters (like around 72). When displaying this data I want to increase the distance between every pair of points so it's at least 2 (eg 69.4 and 71.4 would be two adjacent points).
However I need to ensure I preserve overall order and keep every point as close as possible to where it was originally.
My list of points is simply
scores <- c(13.343, 17.998, 25.413, 27.721, 33.361, 47.263, 52.298, 55.981,
57.851, 72.038, 72.204, 72.296, 73.472, 75.925, 80.748, 85.998)
I want to increase the distance between the clusters of points. The points at 72.038 - 72.296 would all move down to ensure a more even spread.
spacedScores <- c(13.343, 17.998, 25.413, 27.721, 33.361, 47.263, 52.298,
55.981, 57.851, 67.925, 69.925, 71.925, 73.925, 75.925,
80.748, 85.998)
Any suggestions on how to do this most cleanly in R?
Clarifications: I'm not necessarily looking for a mathematically optimal solution,开发者_如何学编程 just something pretty good. I also image most of the time some points will need to move up and some points down - that's good.
You can use diff(scores)
to find the distance between points (I'm assuming values are sorted).
Then use which(diff(scores) < 2)
to identify "bad points" and move them back so that the spacing = 2.
Problem is, moving one point to correct one distance may make the previous or next distance become < 2, so you'll have to repeat this several times.
Here's an example where I "brute force" the solution. You may want to introduce a counter to avoid infinite loop
scores <- c(13.343, 17.998, 25.413, 27.721, 33.361, 47.263,
52.298, 55.981, 57.851, 72.038, 72.204, 72.296, 73.472,
75.925, 80.748, 85.998)
spacedScores <- c(13.343, 17.998, 25.413, 27.721, 33.361,
47.263, 52.298, 55.981, 57.851, 67.925, 69.925, 71.925,
73.925, 75.925, 80.748, 85.998)
plot(scores, pch=20)
points(spacedScores, pch='x', col="red")
badPoints <- which(diff(scores) < 2)
while (length(badPoints) > 0)
{
scores[badPoints] <- scores[badPoints] - (2 - diff(scores)[badPoints])
badPoints <- which(diff(scores) < 2)
}
points(scores, pch='o', col="green")
Here's the result: in black the original points, in green the modified points, in red the spaced points you supplied
I made up a hackish bruteforce method which iterates a few times until every diff is greater than 2 with the smallest required modification in the dataset:
scores <- c(13.343, 17.998, 25.413, 27.721, 33.361, 47.263, 52.298, 55.981, 57.851, 72.038, 72.204, 72.296, 73.472, 75.925, 80.748, 85.998)
done <- 0
while (any(diff(scores)<2)) {
diffs <- diff(scores)
closevals <- which(diffs < 2)
first <- closevals[which.min(diffs[closevals])]
if (which.min(diff(scores[(first-1):(first+1)])) == 1) {
scores[1:(first-1)] <- scores[1:(first-1)] - (2 - (scores[first] - scores[first-1]))
} else {
scores[(first+1):length(scores)] <- scores[(first+1):length(scores)] + (2 - (scores[first+1] - scores[first]))
}
}
> scores
[1] 13.343 17.998 25.413 27.721 33.361 47.263 52.298 55.981 57.981 72.168
[11] 74.168 76.168 78.168 80.621 85.444 90.694
Edit: I have just seen that a lot nicer and simpler answer was given (with exact same results). The only cause I am not deleting my complicated answer is that my loop checks also if adding a small number to a diff between two numbers would work better instead of always subtracting 2-diff()
from smaller values.
I hope my solution could operate better used at real data :)
精彩评论