java, ByteBuffer to parse data from file
In java, I want to parse a file, with heterogenous data (numbers and characters), fast.
I've been reading about ByteBuffer
and memory mapped files.
I can copy it, but when parsing data it becomes tricky. I'd like to do it allocating various bytes. But it become then dependent on the encoding?
If the 开发者_如何学Goformat of the file is, for instance:
someString 8
some other string 88
How can I parse it to String
or Integer
objects?
Thanks!
Udo.
Assuming your format is something like
{string possibly with spaces} {integer}\r?\n
You need to search for the newline, and work backward until you find the first space. You can decode the number yourself and turn it into an int
or turn it into a String and parse it. I wouldn't use an Integer unless you had to. Now you know where the start of the line is and the start of the integer you can extract the String as bytes and convert it into a String using your desired encoding.
This assumes that newline and space are one byte in your encoding. It would be more complicated if they are multi-byte byte it can still be done.
EDIT: The following example prints...
text: ' someString', number: 8
text: 'some other string', number: -88
Code
ByteBuffer bb = ByteBuffer.wrap(" someString 8\r\nsome other string -88\n".getBytes());
while(bb.remaining()>0) {
int start = bb.position(),end, ptr;
for(end = start;end < bb.limit();end++) {
byte b = bb.get(end);
if (b == '\r' || b == '\n')
break;
}
// read the number backwards
long value = 0;
long tens = 1;
for(ptr = end-1;ptr>= start;ptr--) {
byte b = bb.get(ptr);
if (b >= '0' && b <= '9') {
value += tens * (b - '0');
tens *= 10;
} else if (b == '-') {
value = -value;
ptr--;
break;
} else {
break;
}
}
// assume separator is a space....
byte[] bytes = new byte[ptr-start];
bb.get(bytes);
String text = new String(bytes, "UTF-8");
System.out.println("text: '"+text+"', number: "+value);
// find the end of the line.
if (bb.get(end) == '\r') end++;
bb.position(end+1);
}
You can try it this way:
CharacterIterator it = new StringCharacterIterator(StringBuffer.toString());
for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
if (Character.isDigit(c)) {
// character is digit
} else {
// character is not-digit
}
}
Or you can use regex if you prefer
String str = StringBuffer.toString();
String numbers = str.replaceAll("\\D", "");
String letters = str.replaceAll("\\W", "");
Then you need to perform Integer.parseInt()
as usual on the characters in your string numbers
.
Are you looking for java.util.Scanner
? Unless you have really exotic performance requirements, that should be fast enough:
Scanner s = new Scanner(new File("C:\\test.txt"));
while (s.hasNext()) {
String label = s.next();
int number = s.nextInt();
System.out.println(number + " " + label);
}
精彩评论