trouble using xhtml2pdf with unicode

2023-01-22 06:01 问答作者：

I've been trying to convert Hebrew html files without success; the Hebrew characters show up in 开发者_运维技巧the output PDF as black rectangles regardless of any encoding I tried.

I tried some unicode test files included in the pisa distribution: pisa-3.0.33\test\test-unicode-all.html and \test-bidirectional-text.html . I ran xhtml2pdf from the command line both with and without --encoding utf-8. Same result: none of the non-Latin characters made it through.

Is this a fonts problem*? If the unicode test file works for you, was there anything you did to set it up?

*FWIW, at least some of these languages, including Hebrew, should work with Arial.

EDIT: Alternatively, if someone has pisa set up and could try converting the unicode test file above, I would be very grateful.

Inserting following code into html helped me

<style>
@page {
size: a4;
margin: 0.5cm;
}

@font-face {
font-family: "Verdana";
src: url("verdana.ttf");
}

html {
font-family: Verdana;
font-size: 11pt;
}

</style>

in url instead of "verdana.ttf" you should put absolute path to font in your os

If anyone in the future tries, like me, to figure out how to PROPERLY create a PDF file that contains Hebrew using xhtml2pdf, here's what worked for me:

First thing: including the fonts settings as described here by @eviltrue in my HTML. This can be any font as long as it supports Hebrew characters, otherwise any Hebrew characters in the input HTML would simply appear as black rectangles in the PDF.
At the time of writing this answer, while it is possible to output Hebrew characters to PDF in xhtml2pdf, Hebrew characters are outputted in revers order, i.e. שלום כיתה א
would be א התיכ םולש.

At this point I was stuck, but then I stumbled upon this SO asnwer: https://stackoverflow.com/a/15449145/1918837

After installing the python-bidi package, here is an example of a complete solution (used in a python app):

from bidi import algorithm as bidialg
from xhtml2pdf import pisa

HTMLINPUT = """
            <!DOCTYPE html>
            <html>
            <head>
               <meta http-equiv="content-type" content="text/html; charset=utf-8">
               <style>
                  @page {
                      size: a4;
                      margin: 1cm;
                  }

                  @font-face {
                      font-family: DejaVu;
                      src: url(my_fonts_dir/DejaVuSans.ttf);
                  }

                  html {
                      font-family: DejaVu;
                      font-size: 11pt;
                  }
               </style>
            </head>
            <body>
               <div>Something in English - משהו בעברית</div>
            </body>
            </html>
            """

pdf = pisa.CreatePDF(bidialg.get_display(HTMLINPUT, base_dir="L"), outpufile)

# I'm using base_dir="L" so that "< >" signs in HTML tags wouldn't be
flipped by the bidi algorithm

The nice thing about the bidi algorithm is that you can have mixed RTL and LTR languages in the same line (like in the HTML example above) and still have a correctly formatted result.

EDIT: The best way to go now is definitely using wkhtmltopdf

继续阅读：hebrew pdf pisa unicode

trouble using xhtml2pdf with unicode

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？