开发者

Additional spaces in String having read text file to String using FileInputStream

I'm trying to read in a text file to a String variable. The text file has multiple lines. Having printed the String to test the "read-in" code, ther开发者_运维百科e is an additional space between every character. As I am using the String to generate character bigrams, the spaces are making the sample text useless. The code is

try {
  FileInputStream fstream = new FileInputStream(textfile);   
  DataInputStream in = new DataInputStream(fstream);     
  BufferedReader br = new BufferedReader(new InputStreamReader(in));

  //Read corpus file line-by-line, concatenating each line to the String "corpus"
  while ((strLine = br.readLine()) != null) {
    corpus = (corpus.concat(strLine));    
  }

  in.close();    //Close the input stream  
}
catch (Exception e) { //Catch exception if any
  System.err.println("Error test check: " + e.getMessage());
}

I'd be grateful for any advice.

Thanks.


Your text file is likely to be UTF-16 (Unicode) encoded. UTF-16 takes two or four bytes to represent each character. For most western text files, the "in-between" bytes are non-printable and will look like spaces.

You can use the second argument of InputStreamReader to specify the encoding.

Alternatively, modify the text file (iconv on Unix, Save As.. dialog in Notepad on Windows):

Additional spaces in String having read text file to String using FileInputStream

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜