Is there a simple way to preserve trailing tabs in java during file in?
BufferedReader and Scanner's nextLine() seem to be helping a little too much by removing all trailing whitespace. I need to preserve columns, which at the moment are allowed to be empty values, but hesitate to loop through each row using next() or getBytes() identifying tab characters since there could potentially be millions of rows with hundreds of columns.
Are there alternatives to these two methods that I'm missing for reading lines? Are there flags or any other 开发者_开发知识库options to set in these methods to preserve whitespace? Do I simply force the user to use none-blank fields? I'm not alone in trying to preserve whitespace am I?
I have a problem with it when it's reading from a file. I have this code
import java.lang.*;
import java.util.*;
import java.io.*;
public class stringTest
{
public static void main (String[] args) throws IOException
{
BufferedReader br = new BufferedReader(new FileReader("wtf.txt"));
String l = br.readLine();
while (l != null) {
System.out.println(l.split("\t").length);
l = br.readLine();
}
}
}
wtf.txt contains
h\tu\tr\tf\n
o\tm\tg\t\t\n
And the output is
4
3
Additionally, if I add a line anywhere that is all tabs, ie
h\tu\tr\tf\n
\t\t\t\t\t\n
o\tm\tg\t\t\n
The output is
4
0
3
I don't think it's an issue with split because if I use the code
String s = "w\tt\tf\t\t\n";
System.out.println(""+s.split("\t").length);
String s1 = "w\tt\tf\tx\n";
System.out.println(""+s1.split("\t").length);
String s2 = "\t\t\t\t\t\t\n";
System.out.println(""+s2.split("\t").length);
The output is
5
4
6
BufferedReader.readLine()
does preserve whitespace.
EDIT: It sounds like your problem is to do with split
, not BufferedReader
or Scanner
. You can take those out of the equation very easily:
public class Test {
public static void main(String[] args) {
String line = "\t\t\t";
System.out.println(line.split("\t").length); // Prints 0
}
}
There are various different ways of splitting a string on delimiters - you might want to look at the Splitter
class in Guava:
import java.util.List;
import com.google.common.base.Splitter;
import com.google.common.collect.Lists;
public class Test {
public static void main(String[] args) {
Splitter splitter = Splitter.on('\t');
String line = "\t\t\t";
List<String> bits = Lists.newArrayList(splitter.split(line));
System.out.println(bits.size()); // Prints 4
}
}
BufferedReader.readLine()
doesn't remove trailing tabs, certainly. Sample code:
import java.io.*;
public class Test {
public static void main(String[] args) throws IOException {
// Not closing anything just for convenience
String text = "a\tb\t\r\nc\td\t";
BufferedReader reader = new BufferedReader(new StringReader(text));
String line;
while ((line = reader.readLine()) != null)
{
System.out.println(line.replace("\t", "<tab>"));
}
}
}
Output:
a<tab>b<tab>
c<tab>d<tab>
Ditto Scanner.nextLine()
:
import java.io.*;
import java.util.*;
public class Test {
public static void main(String[] args) throws IOException {
// Not closing anything just for convenience
String text = "a\tb\t\r\nc\td\t";
Scanner scanner = new Scanner(new StringReader(text));
while (scanner.hasNextLine())
{
String line = scanner.nextLine();
System.out.println(line.replace("\t", "<tab>"));
}
}
}
(Same output.)
So whatever's stripping your whitespace, it isn't Scanner.nextLine()
or BufferedReader.readLine()
.
精彩评论