开发者

How to find N longest lines in a text file

Write a program to read a multiple line text file and write the 'N' longest lines to stdout.

There was a 开发者_如何学运维similar question but I didn't really understand it since it involves using a min-heap and that would create more work since I'd have to create a min-heap data structure.

I tried to create an array that size n. Then sort it but I would have to sort it each and every time I inserted a new line into the array. I'd like to know what's the simple approach and what would be the optimal one.


Create an array of N strings.

Loop through the file.

If the number of items in the array is < N then just add it to the end.

In all cases store the shortest length of the line in the array. If the array is full, then compare to the shortest line, if the new line is > then that line, replace, and find the shortest line.

Repeat loop.

Print strings.


Edit: As other mentionned my first solution would not work for lines of the same length. A better approach would then be to use a java.util.PriorityQueue<Line> and define an object Line like this:

class Line {
  private final String line;
  public Line(String line) {
    this.line = line;
  }

  public int getLen() {
    return line.length();
  }

  public String getLine() {
    return line;
  }
} 

You then instanciate a PriorityQueue and specify the order using a Comparator.

PriorityQueue<E>: PriorityQueue(int initialCapacity, Comparator<? super E> comparator)

Creates a PriorityQueue with the specified initial capacity that orders its elements according to the specified comparator.

Here is an experimentation using a unit test.

        int n = 1;
        PriorityQueue<Line> queue = new PriorityQueue<Line>(n, new Comparator<Line>() {
            @Override
            public int compare(Line a, Line b) {
                return a.getLen() - b.getLen();
            }
        });

        queue.add(new Line("abcd"));
        queue.add(new Line("abcde"));
        queue.add(new Line("abc"));

        assertEquals(queue.poll().getLine(), "abc");
        assertEquals(queue.poll().getLine(), "abcd");
        assertEquals(queue.poll().getLine(), "abcde");

In this example we see that poll remove remove first the smallest element of the list. You could simply call queue.poll() if queue.size() > n after each line insertion.


here is my solution

the reason of using a treemap is to handle the multiple lines with same length, in this case, all of the lines will be added into the same ArrayList

public class DataCollector {

    private int size;
    private TreeMap<Integer, List<String>> data = new TreeMap<Integer, List<String>>();

    public DataCollector(int nthLargest) {

        if (nthLargest < 1) {
            throw new IllegalArgumentException("can not be smaller than 1");
        }

        this.size = nthLargest;
    }

    public void feed(String line) {

        int length = line.length();

        if (data.size() > 0 && length < data.firstKey()) {
            return;
        }

        getList(length).add(line);



        if (data.size() > size) {
            data.remove(data.firstKey());
        }
    }

    private List<String> getList(int key) {
        if (!data.containsKey(key)) {
            data.put(key, new ArrayList<String>());
        }

        return data.get(key);
    }

    @Override
    public String toString() {
        StringBuilder builder = new StringBuilder();

        for (Entry<Integer, List<String>> entry : data.entrySet()) {
            builder.append(entry.getKey()).append("=").append(entry.getValue()).append("\r\n");
        }

        return builder.toString();
    }

    public List<String> getResult() {


        if (data.isEmpty()) {
            return Collections.EMPTY_LIST;
        }

        return data.firstEntry().getValue();
    }
    public static void main(String[] args) {

        DataCollector co = new DataCollector(1);


        co.feed("b");
        co.feed("abc");
        co.feed("abc1");
        co.feed("abc2");
        co.feed("abc33");
        co.feed("abc34");
        co.feed("abc23");
        co.feed("abc23b");
        co.feed("abc23b");
        co.feed("abc23c");
        co.feed("abc23dd");
        co.feed("abc23ee");
        co.feed("a");

        System.out.println(co);
        System.out.println(co.getResult());

    }

}


Assuming that multiple lines of the same length are not allowed. Here's the algo:

  1. Create a sorted dictionary with key => length of the string and value => line.
  2. Print the last N lines of the dictionary.

If multiple lines of the same length are allowed then Dictionary has to be modified to include List as value. Here's the code:

SortedDictionary> processed = new SortedDictionary>();

        int N;
        StreamReader reader = File.OpenText(args[0]);

        N = int.Parse(reader.ReadLine());

        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();

            if (string.IsNullOrWhiteSpace(line) || string.IsNullOrEmpty(line))
                continue;

            line.Trim();

            if (processed.ContainsKey(line.Length))
            {
                processed[line.Length].Add(line);
            }
            else
            {
                processed.Add(line.Length, new List<string> { line });
            }
        }
        while(N != 0)
        {
            foreach (KeyValuePair<int, List<string>> kvp in processed.Reverse())
            {
                kvp.Value.ForEach(c => Console.WriteLine(c));
                if (--N == 0)
                    break;
            }
        }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜