Remove an item from a Set that doesn't match criteria

2023-01-27 07:32 问答作者：

For a school project, the goal is to do a fuzzy match of a query string to a lyric string inside a Song object. The overall data structure is a TreeMap of unique words paired with sets of songs that contain that word in the lyrics.

I have my preliminary match set of songs that contain the query string. The twist here is that I have to assign each result song a rank based on the number of characters in the match section, spaces inclusive. For example, searching for "she loves you" returns these among the matches:

"... She loves you ..." The Beatles, rank= 13

"... She just loves you ..." Bonnie Raitt, rank=18

"... She loves me, well you ..." Elvis Presley, rank=23

The I'm using to sort the results is:

for (int i=0; i<lyrics.length; i++) {
  if (lyrics[i].equals(query[0])) { //got the s开发者_开发知识库tart point
  start=i; //adjust the start index point

  //loop through lyrics from start point
  for (int j=1; j<query.length; j++) {
    if (lyrics[j].equals(query[query.length-1])) {
        end=i; //found the last word
    }

    //if next lyric word doesn't match this query word
    if (!lyrics[i+j].equals(query[j])) {

    //advance loop through lyrics. when a match is found, i is adjusted to
    //the match index
    for (int k= i+j+1; k<lyrics.length; k++) {
        if (lyrics[k].equals(query[j]) || lyrics[k].equals(query[0]))
            i=k++;
        } //end inner advance loop

    } //end query string test

  }//end query test loop

  song.setRanks(start, end); //start and end points for the rank algorithm.

} //end start point test

Since all the songs in the result set contain the query words in any particular order, they will not all be included in the result printout. Using this algorithm, how can I set a trigger to remove the song from the set if the query is not matched to any particular length?

Edit- Is Lucene a solution to this? This is a gray area in the project, and one I will bring up in class tomorrow. He is allowing us to choose whatever data structures for this project, but I don't know if using another implementation for string matching will pass muster.

Edit 2 @ belisarius- I don't see how edit distance applies here. The most common application of Levenshtein distance requres a String a of length n and String b of length m, and the distance is the number of edits required for a==b. For this project, all that is required is the rank of characters in a match, with the start and end points unknown. With some changes to the code posted above, I am finding the start and end points accurately. What I need is a way to remove the non-matches from the set if the the lyrics don't fit the search in any fashion.

Probably you want to have a look at the Levenstein distance. The Apache commons-lang library implemented it in version 2.1 in the StringUtils class.

A Patricia trie might just do it for you.

Go through this one see if it has what u need.

http://code.google.com/p/patricia-trie/

继续阅读：algorithm fuzzy-search search

Remove an item from a Set that doesn't match criteria

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？