Additional spaces in String having read text file to String using FileInputStream
I'm trying to read in a text file to a String variable. The text file has multiple lines. Having printed the String to test the "read-in" code, ther开发者_运维百科e is an additional space between every character. As I am using the String to generate character bigrams, the spaces are making the sample text useless. The code is
try {
FileInputStream fstream = new FileInputStream(textfile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
//Read corpus file line-by-line, concatenating each line to the String "corpus"
while ((strLine = br.readLine()) != null) {
corpus = (corpus.concat(strLine));
}
in.close(); //Close the input stream
}
catch (Exception e) { //Catch exception if any
System.err.println("Error test check: " + e.getMessage());
}
I'd be grateful for any advice.
Thanks.
Your text file is likely to be UTF-16 (Unicode) encoded. UTF-16 takes two or four bytes to represent each character. For most western text files, the "in-between" bytes are non-printable and will look like spaces.
You can use the second argument of InputStreamReader to specify the encoding.
Alternatively, modify the text file (iconv on Unix, Save As.. dialog in Notepad on Windows):
精彩评论