开发者

Levenshtein (or Damerau-Levenshtein, if possible!) distance is Assembly

I have a program in which I need to calculate several times the Levenshtein distance between pairs of words (one of them fixed), and several times may range from about 1000 to 120000 for every fixed word. Since I want to optimize this program as much as I can I thought about implement开发者_JS百科ing these calculations in assembly. The problem is that I know nothing about assembly except for the theory and that it may represent big speed improvements. Can anyone please help me or provide me with the assembly code for this distance? Also, how can I call assembly from a C# module?


You could easily use a BK-tree to create a lookup tree if Levenshtein is enough. Damarau-Levenshtein can not be used with a metric tree.

You dont need to write this implementation in assembler or C#, you can get far by using unsafe code and pointers.

  • Read and cache str.Length, those are method invocations (most probably inlined/optimized)
  • Access your strings with pointers.
    fixed(char* ptrX=strX, ptrY=strY) ...
  • You can create your table/array/state as an int[rows*cols] instead of int[rows][cols] and use pointers to read/write.
    int[] state = new int[rows*cols]
    fixed(int* ptrState=state)
  • You dont really need more than two rows in your state table, you have the one you read from, and the one you write to. Then swap the pointers and read from what you just wrote.
  • I believe you can optimize by removing identical prefixes/suffixes
    L('catz', 'cats') == L('z', 's') == 1
    L('rats', 'cats') == L('r', 'c') == 1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜