开发者

What's the most efficient algorithm to find the indexes of multiple characters inside a string (javascript)?

I'm looking for the fastest method I can use to search a body of text for the indexes of multiple characters.

For examp开发者_StackOverflow中文版le:

searchString = 'abcdefabcdef'; 
searchChars  = ['a','b'];
// returns {'a':[0,6], 'b':[1,7]}


You should be able to use a regular expression to find all occurances of each character. Something like:

function findIndexes(find, str) {
  var output = {};
  for (var i = 0; i < find.length; i++) {
    var m = [];
    var r = new RegExp('.*?' + find[i], 'g');
    var ofs = -1;
    while ((x = r.exec(str)) != null) {
      ofs += x[0].length;
      m.push(ofs);
    }
    output[find[i]] = m;
  }
  return output;
}

Edit:

Did some changes, and now it works. :) However, as Javascript doesn't have a matches method to get all matches at once, it's not really any improvment over using indexOf... :P

Edit 2:

However, you can use a regular expression to find any of the characters, so you only need to loop the string once instead of once for each character. :)

function findIndexes(find, str) {
  var output = {};
  for (var i = 0; i < find.length; i++) output[find[i]] = [];
  var r = new RegExp('.*?[' + find.join('') + ']', 'g');
  var ofs = -1;
  while ((x = r.exec(str)) != null) {
    ofs += x[0].length;
    output[x[0].substr(x[0].length-1,1)].push(ofs);
  }
  return output;
}


Assuming few letters to search for and many letters to search against (i.e. low number of letter, long strings), the latter is the most efficient, since you only go through the string once, and then test each letter.

The other one goes through the string as many times as there are letters to search for.


After timing a few single pass algorithms and Guffa's regex, I ended up going with this:

function findIndexesMultiPass(str,find) {
  var x, output = {};

  for (var i = 0; i < find.length; i++) {
    output[find[i]] = [];
    x = 0;
    while ((x = str.indexOf(find[i], x)) > -1) {
      output[find[i]].push(x++);
    }
  }

  return output;
}

var searchString = "abcd abcd abcd";
var searchChars  = ['a', 'b'];
var result = findIndexesMultiPass(searchString, searchChars);
// {'a':[0,5,10], 'b':[1,6,11]}

This turned out to be pretty slow:

function findIndexesOnePass(str,find) {
  var output = {};

  for (var i = 0; i < find.length; i++) {
    output[find[i]] = [];
  }

  for (var i = 0; i < str.length; i++) {
    var currentChar = str.charAt(i);
    if (output[currentChar] !== undefined) {
      output[currentChar].push(i);
    }
  }

  return output;
}

var searchString = "abcd abcd abcd";
var searchChars  = ['a', 'b'];
var result = findIndexesOnePass(searchString, searchChars);
// {'a':[0,5,10], 'b':[1,6,11]}

Rough times (indexes of 3 characters)

Google Chrome (Mac)
findIndexesMultiPass: 44ms
findIndexesOnePass: 799ms
findIndexesRegEx: 95ms

Safari
findIndexesMultiPass: 48ms
findIndexesOnePass: 325ms
findIndexesRegEx: 293ms

Firefox
findIndexesMultiPass: 56ms
findIndexesOnePass: 369ms
findIndexesRegEx: 786ms
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜