Fast alternative to java.nio.charset.Charset.decode(..)/encode(..)

2022-12-16 18:44 问答作者：

Anybody knows a faster way to do what java.nio.charset.Charset.decode(..)/encode(..) does?

It's currently one of the bottleneck of a technology that I'm using.

[EDIT] Specifically, in my application, I changed one segment from a java-solution to a JNI-solution (because there was a C++ technology that was most suitable for my needs than the Java technology that I was using).

This change brought about some significant decrease in speed (and significant increase in cpu & mem usage).

Looking deeper into the JNI-solution that I used, the java application is communicating with the C++ application via byte[]. These byte[] are produced by Charset.encode(..) from the java side and passed to the C++ side. Then when the C++ response with a byte[], it gets decoded in the java side via Charset.decode(..).

Running this against a profiler, I see that Charset.decode(..) and Charset.encode(..) both took a significantly long time compared to the whole execution time of the JNI-solution (I 开发者_Python百科profiled only the JNI-solution because it's something I could whip up quite quickly. I'll profile the whole application on a latter date once I free up my schedule :-) ).

Upon reading further regarding my problem, it's seems that it's a known problem with Charset.encode(..) and decode(..) and it's being addressed in Java7. However, moving to Java7 is not an option for me (for now) due to some constraints.

Which is why I ask here if somebody knows a Java5 solution / alternative to this (Sorry, should have mentioned that this was for Java5 sooner) ? :-)

The javadoc for encode() and decode() make it clear that these are convenience methods. For example, for encode():

Convenience method that encodes Unicode characters into bytes in this charset.

An invocation of this method upon a charset cs returns the same result as the expression

 cs.newEncoder()
   .onMalformedInput(CodingErrorAction.REPLACE)
   .onUnmappableCharacter(CodingErrorAction.REPLACE)
   .encode(bb);

except that it is potentially more efficient because it can cache encoders between successive invocations.

The language is a bit vague there, but you might get a performance boost by not using these convenience methods. Create and configure the encoder once, and then re-use it:

 CharsetEncoder encoder = cs.newEncoder()
   .onMalformedInput(CodingErrorAction.REPLACE)
   .onUnmappableCharacter(CodingErrorAction.REPLACE);

 encoder.encode(...);
 encoder.encode(...);
 encoder.encode(...);
 encoder.encode(...);

It always pays to read the javadoc, even if you think you already know the answer.

First part - it is bad idea in general to pass arrays into JNI code. Because of GC, Java has to copy arrays. In the worth case array will be copied two times - on the way to JNI code and on the way back :)

Because of that Buffer class hierarchy was introduced. And of course Java dev team creates a nice way to encode/decode chars:

Charser#newDecoder returns you CharsetDecoder, which could be used to comvert ByteBuffer to CharBuffer according to a Charset. There are two main method versions:

CoderResult decode(ByteBuffer in, CharBuffer out, boolean endOfInput)
CharBuffer decode(ByteBuffer in)

For the max performance you need the first one. It has no hidden memory allocations inside.

You need to note that Encoder/Decoder could maintance internal state, so be careful (for example if you map from 2byte encoding and input buffer has one half of char...). Also encoder/decoder are not threadsafe

There are very few reasons to "squeeze" a string in a byte array. I would recommend to write the C functions to take utf-16 strings as parameters. This way there is no need for any conversion.

继续阅读：character-encoding decode encode performance

Fast alternative to java.nio.charset.Charset.decode(..)/encode(..)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？