开发者

How to compile a java source file which is encoded as "UTF-8"?

I saved my Java source file specifying it's encoding type as UTF-8 (using Notepad, by default Notepad's encoding type is ANSI) and then I tried to compile it using:

javac -encoding "UTF-8" One.java

but it gave an error message"

One.java:1: illegal character: \65279

?public class One {

^
1 error

Is there any other way, I can compile this?

Here is the source:

public class One {
开发者_StackOverflow中文版    public static void main( String[] args ){
        System.out.println("HI");
    }
} 


Your file is being read as UTF-8, otherwise a character with value "65279" could never appear. javac expects your source code to be in the platform default encoding, according to the javac documentation:

If -encoding is not specified, the platform default converter is used.

Decimal 65279 is hex FEFF, which is the Unicode Byte Order Mark (BOM). It's unnecessary in UTF-8, because UTF-8 is always encoded as an octet stream and doesn't have endianness issues.

Notepad likes to stick in BOMs even when they're not necessary, but some programs don't like finding them. As others have pointed out, Notepad is not a very good text editor. Switching to a different text editor will almost certainly solve your problem.


Open the file in Notepad++ and select Encoding -> Convert to UTF-8 without BOM.


This isn't a problem with your text editor, it's a problem with javac ! The Unicode spec says BOM is optionnal in UTF-8, it doesn't say it's forbidden ! If a BOM can be there, then javac HAS to handle it, but it doesn't. Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.

The proposed solution of removing the BOM is only a workaround and not the proper solution.

This bug report indicates that this "problem" will never be fixed : https://web.archive.org/web/20160506002035/http://bugs.java.com/view_bug.do?bug_id=4508058

Since this thread is in the top 2 google results for the "javac BOM" search, I'm leaving this here for future readers.


Try javac -encoding UTF8 One.java

Without the quotes and it's UTF8, no dash.

See this forum thread for more links


See Below For example we can discuss with an Program (Telugu words)

Program (UnicodeEx.java)

class UnicodeEx {  
    public static void main(String[] args) {   
        double ఎత్తు = 10;  
        double వెడల్పు = 25;   
        double దీర్ఘ_చతురస్ర_వైశాల్యం;  
        System.out.println("The Value of Height = "+ఎత్తు+" and Width = "+వెడల్పు+"\n");  
        దీర్ఘ_చతురస్ర_వైశాల్యం = ఎత్తు * వెడల్పు;  
        System.out.println("Area of Rectangle = "+దీర్ఘ_చతురస్ర_వైశాల్యం);  
    }  
}

This is the Program while saving as "UnicodeEx.java" and change Encoding to "unicode"

**How to Compile**

javac -encoding "unicode" UnicodeEx.java

How to Execute

java UnicodeEx

The Value of Height = 10.0 and Width = 25.0

Area of Rectangle = 250.0


I know this is a very old thread, but I was experiencing a similar problem with PHP instead of Java and Google took me here. I was writing PHP on Notepad++ (not plain Notepad) and noticed that an extra white line appeared every time I called an include file. Firebug showed that there was a 65279 character in those extra lines.

Actually both the main PHP file and the included files were encoded in UTF-8. However, Notepad++ has also an option to encode as "UTF-8 without BOM". This solved my problem.

Bottom line: UTF-8 encoding inserts here and there this extra BOM character unless you instruct your editor to use UTF8 without BOM.


Works fine here, even edited in Notepad. Moral of the story is, don't use Notepad. There's likely a unprintable character in there that Notepad is either inserting or happily hiding from you.


I had the same problem. To solve it opened the file in a hex editor and found three "invisible" bytes at the beginning of the file. I removed them, and compilation worked.


  • Open your file with WordPad or any other editor except Notepad.

  • Select Save As type as Text Document - MS-DOS Format

  • Reopen the Project


To extend the existing answers with a solution for Linux users:

To remove the BOM on all .java files at once, go to your source directory and execute

find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix

Requires find, xargs and dos2unix to be installed, which should be included in most distributions. The first statement finds all .java files in the current directory recursively, the second one converts each of them with the dos2unix tool, which is intended to convert line endings but also removes the BOM.

The line endings conversion should have no effect as it should already be in Linux \n format on Linux if you configure your version control correctly but be warned that it does that as well in case you have one of those rare cases where that is not intended.


In the Intellij Idea(Settings>Editor>File Encodings), the project encoding was "windows-1256". So I used the following code to convert static strings to utf8

protected String persianString(String persianStirng) throws UnsupportedEncodingException {
    return new String(persianStirng.getBytes("windows-1256"), "UTF-8");
}

Now It is OK! Depending on the file encoding you should change "windows-1256" to a proper one

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜