Modifying a Levenshtein distance function to calculate distance between two sets of x-y coordinates?
I've been trying to work on modifying a Levenshtein Distance function so that it can find the distance between two lines, or sets of x-y coordinates (in other words, how similar or different the lines are, not their geometric distance). I'm running into some problems though. I get how you take the value above to get deletion cost, and the one to the left to get addition, but during substitution I'm trying to use euchlidian distance, and it's not working for me.
If you could point out what I'm doing wrong, that would be awesome.
Here's the relevant code in javascript:
padlock.dtw = {
开发者_如何学C _deletionCost: 1,
_insertionCost: 1,
levenshtein: function(a,b){
var l1 = a.length, l2 = b.length;
if (Math.min(l1, l2) === 0) {
return Math.max(l1, l2);
}
var i = 0, j = 0, d = [];
for (i = 0 ; i <= l1 ; i++) {
d[i] = [];
d[i][0] = i;
}
for (j = 0 ; j <= l2 ; j++) {
d[0][j] = j;
}
for (i = 1 ; i <= l1 ; i++) {
for (j = 1 ; j <= l2 ; j++) {
d[i][j] = Math.min(
d[i - 1][j] + this._deletionCost, /* deletion */
d[i][j - 1] + this._insertionCost, /* addition */
d[i - 1][j - 1] + (a[i - 1] === b[j - 1] ? 0 : this.euclideanDistance(a[i-1], b[j-1])) /* substitution, use euchlidean distance as cost */
);
}
}
this._debugPrintMatrix(d);
return d[l1][l2];
},
euclideanDistance: function(a, b){
var xd = a[0]-b[0];
var yd = a[1]-b[1];
return Math.abs(Math.sqrt(Math.pow(xd, 2) + Math.pow(yd, 2)));
},
_debugPrintMatrix: function(m){
for(var i=0;i<m.length;i++){
console.log.apply(this, m[i]);
}
}
}
Sample output:
>>> padlock.dtw.levenshtein( [ [1,1], [0,9], [3,3], [4,4] ], [ [1,1], [2,2], [3,3], [4,4] ] )
Distance Matrix:
0 1 2 3 4
1 0 1 2 3
2 1 2 3 4
3 2 2.414213562373095 2 3
4 3 3.414213562373095 3 2
Final Distance: 2
If I understood your question correctly, then you should completely remove the code for computing euclidian distance between two points!
First, let me restate your question:
You have two sets of points, e.g.
A = [ [1,1], [0,9], [3,3], [4,4] ]
B = [ [1,1], [2,2], [3,3], [4,4] ]
You try to compute a levenshtein distance between those two sets. You substitute "letters" with "points".
Up to this point, it makes sense. Just replace the "letters" in levenshtein algorithm with points and you are done!
But you made a mistake: The original Levenshtein algorithm does not calculate distances between two letters, like e.g. distance(a,b)=1 or distance(a,d)=3.
You tried to extend the algorithm with such a thing (using euclideanDistance() function). But levenshtein algorithm is not meant for such things. And if you have a close look at it, you will see, that it will not work (the values in the matrix have a meaning, and each loop iteration uses values in the matrix that were computed in a previous iteration).
Levenshtein distance is an edit distance, no geometrical distance. You tried to change it, so that it computes a mix of edit and geometrical distance. This mix makes no sense, it is useless and wrong, IMHO.
Conclusion
To compute the levenshtein distance of two sets of x-y coordinates, you should replace your euclidianDistance() with a simple equality comparison (a[0]==b[0] && a[1]==b[1]
).
Then the levenshtein algorithm will give you an "edit distance".
Wouldn't it be more clever to use geometrics for calculating the distance between two lines? Or is there a specific reason you wouldn't want to use that.
Since two lines always have a point of intersection, unless they are parallel (edit, thanks), it's easy to calculate the smallest distance: that's 0 or insert some math, which can be found on google!
I don't understand why you would use Levenshtein for this, it appears that you would get much better results from simple calculations.
- To find the difference in angle of the lines, you could simply find the angle for each line (arctan((x_1-x_2)/(y_1-y_2))) and subtract them.
- To find the average distance of the lines, you could simply use the distance formula with the first point of each line and the second point of each line and average those distances together.
Other than that (unless your lines are in 3D), there's nothing else to really "compare" them with.
Perhaps I've misunderstood. Are you looking to compare the string values for the lines?
精彩评论