开发者

Efficient way to find Frequency of a character in a String in java : O(n)

In a recent interview I was asked to write the below program. Find out the character whose frequency is minimum in the given String ? So I tried by iterating through the string by using charAt and storing the character as key in a HashMap and the number of occurences as its value. Now Again I have to iterate on the Map to find the lowest element.

Is there a more efficient way to do it as obviously the above one is too intensive i guess.

Update and Another Solution

After some thought process and answers I think the best time that this can be is O(n). In the first iteration we will have to iterate through the String character by character and then store th开发者_如何学运维eir frequency in an Array at the specific position(character is an int) and same time have two temporary variables which maintain the least count and the corresponding character.So when I go to the next character and store its frequency in arr[char] = arr[char]+1;At the same time I will check if the temp varible has a value greater than this value,if yes then the temp varible will be this value and also the char will be this one.In this way i suppose we dont need a second iteration to find the smallest and also no sorting is required I guess

.... Wat say ? Or any more solutions


I'd use an array rather than a hash map. If we're limited to ascii, that's just 256 entries; if we're using Unicode, 64k. Either way not an impossible size. Besides that, I don't see how you could improve on your approach. I'm trying to think of some clever trick to make it more efficient but I can't come up with any.

Seems to me the answer is almost always going to be a whole list of characters: all of those that are used zero times.

Update

This is probably clost to the most efficient it could be in Java. For convenience, I'm assuming we're using plain Ascii.

public List<Character> rarest(String s)
{
  int[] freq=new int[256];

  for (int p=s.length()-1;p>=0;--p)
  {
    char c=s.charAt(p);
    if (c>255)
      throw new UnexpectedDataException("Wasn't expecting that");
    ++freq[c];
  }
  int min=Integer.MAX_VALUE;
  for (int x=freq.length-1;x>=0;--x)
  {
    // I'm assuming we don't want chars with frequency of zero
    if (freq[x]>0 && min>freq[x])
      min=freq[x];
  }
  List<Character> rares=new ArrayList<Character>();
  for (int x=freq.length-1;x>=0;--x)
  {
    if (freq[x]==min)
      rares.add((char)x);
  }
  return rares;
}

Any effort to keep the list sorted by frequency as you go is going to be way more inefficient, because it will have to re-sort every time you examine one character.

Any attempt to sort the list of frequencies at all is going to be more inefficient, as sorting the whole list is clearly going to be slower than just picking the smallest value.

Sorting the string and then counting is going to be slower because the sort will be more expensive than the count.

Technically, it would be faster to create a simple array at the end rather than an ArrayList, but the ArrayList makes slightly more readable code.

There may be a way to do it faster, but I suspect this is close to the optimum solution. I'd certainly be interested to see if someone has a better idea.


I think your approach is in theory the most efficient (O(n)). However in practice it needs quite a lot of memory, and is probably very slow.

It is possibly more efficient (at least it uses less memory) to convert the string to a char array, sort the array, and then calculate the frequencies using a simple loop. However, in theory it is less efficient (O(n log n)) because of sorting (unless you use a more efficient sort algorithm).

Test case:

import java.util.Arrays;

public class Test {

    public static void main(String... args) throws Exception {
        //        System.out.println(getLowFrequencyChar("x"));
        //        System.out.println(getLowFrequencyChar("bab"));
        //        System.out.println(getLowFrequencyChar("babaa"));
        for (int i = 0; i < 5; i++) {
            long start = System.currentTimeMillis();
            for (int j = 0; j < 1000000; j++) {
                getLowFrequencyChar("long start = System.currentTimeMillis();");
            }
            System.out.println(System.currentTimeMillis() - start);
        }

    }

    private static char getLowFrequencyChar(String string) {
        int len = string.length();
        if (len == 0) {
            return 0;
        } else if (len == 1) {
            return string.charAt(0);
        }
        char[] chars = string.toCharArray();
        Arrays.sort(chars);
        int low = Integer.MAX_VALUE, f = 1;
        char last = chars[0], x = 0;
        for (int i = 1; i < len; i++) {
            char c = chars[i];
            if (c != last) {
                if (f < low) {
                    if (f == 1) {
                        return last;
                    }
                    low = f;
                    x = last;
                }
                last = c;
                f = 1;
            } else {
                f++;
            }
        }
        if (f < low) {
            x = last;
        }
        return (char) x;
    }

}


The process of finding frequency of characters in a String is very easy.
For answer see my code.

import java.io.*;
public class frequency_of_char
{
    public static void main(String args[])throws IOException
    {
        BufferedReader in=new BufferedReader(new InputStreamReader(System.in));
        int ci,i,j,k,l;l=0;
        String str,str1;
        char c,ch;
        System.out.println("Enter your String");
        str=in.readLine();
        i=str.length();
        for(c='A';c<='z';c++)
        {
            k=0;
            for(j=0;j<i;j++)
            {
                ch=str.charAt(j);
                if(ch==c)
                    k++;
            }
            if(k>0)
            System.out.println("The character "+c+" has occured for "+k+" times");
        }
    }
}


I'd do it the following way as it involves the fewest lines of code:

character you wish to want to know frequency of: "_"
String "this_is_a_test"

String testStr = "this_is_a_test";
String[] parts = testStr.split("_"); //note you need to use regular expressions here
int freq = parts.length -1;

You may find weird things happen if the string starts or ends with the character in question, but I'll leave it to you to test for that.


Having to iterate through the HashMap is not necessarily bad. That will only be O(h) where h is the HashMap's length--the number of unique characters--which in this case will always be less than or equal to n. For the example "aaabbc", h = 3 for the three unique characters. But, since h is strictly less than the number of possible characters: 255, it is constant. So, your big-oh will be O(n+h) which is actually O(n) since h is constant. I don't know of any algorithm that could get a better big-oh, you could try to have a bunch of java specific optimizations, but that said here is a simple algorithm I wrote that finds the char with the lowest frequency. It returns "c" from the input "aaabbc".

import java.util.HashMap;
import java.util.Map;

public class StackOverflowQuestion {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    System.out.println("" + findLowestFrequency("aaabbc"));

}

public static char findLowestFrequency(String input) {

    Map<Character, Integer> map = new HashMap<Character, Integer>();

    for (char c : input.toCharArray())

        if (map.containsKey(c))
            map.put(c, map.get(c) + 1);
        else
            map.put(c, 0);

    char rarest = map.keySet().iterator().next();

    for (char c : map.keySet())

        if (map.get(c) < map.get(rarest))
            rarest = c;

    return rarest;

}

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜