Optimizing JPEG quantization table for grayscale text document images

2023-03-25 17:59 问答作者：

Signal Processing: Image Compression:

I want to store full-color text images in JPEG or TIFF-JPEG format. These images contain text documents with some color graphics. Even with very high JPEG Quality levels, there is still a lot of artifacts and degradations on the text.

I have total control of the JPEG encoding parameters, including subsampling ratios and quantization matrix.

My question is:

Can I optimize those parameters for text documents? (Beyond the quality level)
Can I apply different par开发者_JS百科ameter settings for different parts of the image?
Would it help if I manually truncate (quantize) the coefficients for different parts of the images, before encoding?

(Will attach sample image later coz can't access imgur at office.)

Have you considered using PDF as the output? With PDF, you can do dynamic thresholding on the black and white text to compress it as 1-bpp CCITT G4. You can also capture the color objects on the page and compress them with FLATE or JPEG. The PDF page can be a composite of those 2 types of objects. You'll get the best possible quality and much better compression.

I second BitBank's suggestion of using PDF to compress different content in different ways - I see this sometimes called 'MRC' - Mixed Raster Content. Lots of literature.

You do not say whether your images are synthetic or scanned. For synthetic images, my personal experience is that even LZW (in TIFF) can do a remarkable job, especially if you are willing to do some (lossy) preprocessing to homogenize the sample values. That is, if you can quantize enough similar values so they become equal.

But if your images are scanned, it is very hard to preprocess to a clean enough image that LZW or any other lossless compression can find traction. So that leaves JPEG, about which I would say almost the opposite of ruslik, that the lossiness of JPEG is highly adjustable both globally and in the frequency domain. Of course it is possible to adjust the quantization tables to selectively improve text quality. I'm not an expert, but the starting point I happen to remember is work by Giordano Bruno Beretta & co. at HP Labs e.g. Method for selecting JPEG quantization tables for low bandwidth applications

Standard JPEG is lossy, and there's nothing you can do about it. And the information that is lost should be unnoticed on a natural (smooth) image.

My point is that for an artificial image you should use a lossless codec. Not the lossless JPEG, but something that supports at least RLE. For example, PNG or JPEG-LS will have much better results on such images.

继续阅读：image-compression image-processing jpeg optimization

Optimizing JPEG quantization table for grayscale text document images

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？