Weird characters when filling PDF with PDFTk

2023-03-06 15:44 问答作者：

I'm using php with PDFTK on Ubuntu. When filling a PDF with data, I get weird characters for this letters with accents: á ó í. I'm using UTF-8 encoding: I check开发者_运维问答ed with echo mb_check_encoding($var, 'UTF-8') which outputs 1 - TRUE. Any idea what I can do?

I also tried converting to ISO with utf8_decode, but still, no luck.

Thanks

You're right, utf8_decode() will work for characters which can be encoded as Windows-1252 (i.e. U+0000–U+00FF).

However it won't work for characters which can't be encoded in Windows-1252.

You can always encode characters using UTF-16BE, though. You can do this for a single field only, e.g. to encode the word "özil":

<<
/V (þÿ^@ö^@z^@i^@l)
/T (name)
>>

(Here the "^@" indicates a NUL character (U+0000). This is how it looks in my editor (vim), if the file is encoded in Windows-1252 (latin1).)

Note that you need to use a byte order mark (which will appear as "þÿ" if your file is encoded in Windows-1252) and you'll need to encode the entire string (between the two parentheses) in UTF-16.

If you're generating the FDF in a PHP script you can do something like this:

<<
/V (<?php echo chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding("özil", 'UTF-16BE')); ?>)
/T (name)
>>

You can also write out the hex codes like this (i.e. enclosed in angular brackets rather than parentheses):

<<
/V <FEFF00F6007A0069006C>
/T (name)
>>

This has exactly the same result (the string "özil"). It's less efficient in terms of characters, but it actually seems to be more reliable in pdftk, which has some bugs I've found (in version 2.02).

Finally, you can also write out the Unicode code point for any character in octal notation (\ddd). For example, ö has codepoint U+00F6, which in octal is 366, so you can write:

<<
/V (\366zil)
/T (name)
>>

However, this only works up to U+00FF (octal 377). Beyond that, you'd have to use UTF-16.

The PDF standard allows you to set the encoding to UTF-8 for the whole FDF document. I tried this and it didn't work with pdftk, however in theory it would be done like this:

%FDF-1.2
1 0 obj
<<
/Version /1.3
/Encoding /utf_8
/FDF

(You would presumably have to set the FDF version to 1.3 (or more) in the header too, according to the standard.)

You can also do this at the field level:

<<
/V (özil)
/T (name)
/Encoding /utf_8
>>

But as I said, I didn't manage to get any of this to work. pdftk just seems to ignore it.

Solved with utf8_decode. I guess there were some caching problems and the characters were still showing

继续阅读：encoding pdf pdftk php

Weird characters when filling PDF with PDFTk

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？