开发者

Question regarding the Smith-Waterman algorithm

I am running some string matching开发者_运维技巧 tests using the Smith-Waterman algorithm. I am currently using SimMetrics (the Java open source project) to run the tests.

Can anyone explain why when I compare 'Bloggs J.' to 'Bloggs' I get a similarity value of 1.0?

There obviously is a gap (e.g. 'o' and '.'), but it does not appear to be penalized.

Thank you in advance.


The Smith-Waterman Algorithm is a local alignment algorithm. That means that it's designed to align pieces of strings that align well, as opposed to aligning whole strings. The "gap" you speak of is not supposed to be penalized as a gap because it is considered to have occurred outside the aligned region. No string with the length of 'Bloggs' could possibly align better to 'Bloggs J.' than 'Bloggs' does. If you want a global alignment, you should use the Needleman-Wunsch Algorithm instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜