exception:OutOfMemoryError: Java heap space when running the code
I have an application which forms all the possible pairs and then compare the pairs, but when I run the applicaion, it gave me exception:OutOfMemoryError:Java heap space when running the code. I tried -Xmx1500m, but the exception just kept coming. the code for generating the pairs are as following
File file = ...;
final Map<Pair, Collection<Integer>> lineNumbersByPair = new HashMap<Pair, Collection<Integer>>();
/*
* Step 1: Read in the lines, one by one.
*/
Reader reade开发者_如何转开发r = new FileReader(file);
try {
BufferedReader bufferedReader = new BufferedReader(reader);
try {
String line;
int lineNumber = 0;
while ((line = bufferedReader.readLine()) != null) {
lineNumber++;
String[] tokens = line.split("\\s+");
int[] values = new int[tokens.length];
for (int i = 0; i < tokens.length; i++) {
values[i] = Integer.parseInt(tokens[i]);
}
for (int i = 0; i < values.length; i++) {
for (int j = i + 1; j < values.length; j++) {
Pair pair = new Pair(values[i], values[j]);
Collection<Integer> lineNumbers;
if (lineNumbersByPair.containsKey(pair)) {
lineNumbers = lineNumbersByPair.get(pair);
} else {
lineNumbers = new HashSet<Integer>();
lineNumbersByPair.put(pair, lineNumbers);
}
lineNumbers.add(lineNumber);
}
}
}
} finally {
bufferedReader.close();
}
} finally {
reader.close();
}
/*
* Step 2: Identify the unique pairs. Sort them according to how many lines they appear on (most number of lines to least number of lines).
*/
List<Pair> pairs = new ArrayList<Pair>(lineNumbersByPair.keySet());
Collections.sort(
pairs,
new Comparator<Pair>() {
@Override
public int compare(Pair pair1, Pair pair2) {
Integer count1 = lineNumbersByPair.get(pair1).size();
Integer count2 = lineNumbersByPair.get(pair2).size();
return count1.compareTo(count2);
}
}
);
Collections.reverse(pairs);
/*
* Step 3: Print the pairs and their line numbers.
*/
for (Pair pair : pairs) {
Collection<Integer> lineNumbers = lineNumbersByPair.get(pair);
if (lineNumbers.size() > 1) {
System.out.println(pair + " appears on the following lines: " + lineNumbers);
}
}
I am reading a file around 15mb and it contains like 20000lines of single numbers, and there are around 40 numbers per line, it forms all possible pairs of each line. anyone has any idea about how to solve this? thanks
My math may be off, but this is probably why you're still running out of space.
So 40 numbers per line, 20000 lines = 800000 numbers.
800000 C 2 = 319999600000 combinations of numbers.
At 4 bytes an int
, and with Pair<int, int>
, each Pair is at least 8 bytes, and then you add it to your data structure.
8 bytes * 319999600000 = 2+ terabytes.
After re-reading your problem, each line is separate from the next.
40 numbers per line => 40 C 2 = 780 combinations per line * 20000 lines = 15600000 possible unique pairs * 8 bytes per pair = 119 MB purely for int
in the worst case. Add to that the memory taken up by references since Java does not allow primitive types in a Collection.
But after taking another look at your program, I have some suggestions:
Why are you mapping a Pair
to a Set<Integer>
?
If you are interested in just the number of occurrences of each Pair
, you do not need to keep track of the line numbers the pairs occur on - you only want to store the number of times it has shown up.
So in this case you want to map Pair
to Integer
. This could reduce the amount of memory you require.
Do you care about ordering of the pair?
Your for
loop seems to indicate you do not care about ordering, i.e. the pair (30,45) is the same as the pair (45,30). If so, you should create your Pair
based on relative ordering in the pair. Perhaps making a Pair
based on the smallest value first, so that every time you encountered two integers m and n, you would always create the pair as Pair(m, n)
. Also see the next section about hashCode()
and equals()
.
Have you implemented Pair
's int hashCode()
and boolean equals(Object)
methods?
This could be the difference between an actual working program and a broken one.
In your case, you want Pair
objects to test for logical equality as it is a custom class, so you will have to override and implement your own equals(Object)
method. You will also want to override hashCode()
as you must always do so when overriding equals()
.
This is detailed in the excellent Effective Java, and here's a sample of the chapter discussing this: http://java.sun.com/developer/Books/effectivejava/Chapter3.pdf
When data became too big to fit memory, the only way is use extended memory (HDD). Here you can partition and store on disk, load small part each onto memory and search.
Or you should use a algobrithm that use less memory and more processor.
1.Search the file, search all num, and create a relative 2-d matrix or something like below.
1 2 3 4 ...
1 0 1 0 0
2 0 1 0 0
3 0 0 0 0
...
2.You can sort on this matrix.
3.One by one pair, search the file to get line num contain both numbers in pair.
Sorry because my bad english.
精彩评论