iTextSharp HTML to PDF preserving spaces

2023-04-10 00:03 问答作者：

I am using the FreeTextBox.dll to get user input, and storing that information in HTML format in the database. A samle of the user's input is the below:

133 Peachtree St NE

Atlanta, GA 30303

404-652-7777

Cindy Cooley

www.somecompany.com

Product Stewardship Mgr

9/9/2011

Deidre's Company

123 Test St

Atlanta, GA 30303

Test test.

I want the HTMLWorker to perserve the white spaces the users enters, but it strips it out. Is there a way to perserve the user's white space? Below is an example of how I am creating my PDF document.

Public Shared Sub CreatePreviewPDF(ByVal vsHTML As String, ByVal vsFileName As String)

        Dim output As New MemoryStream()
        Dim oDocument As New Document(PageSize.LETTER)
        Dim writer As PdfWriter = PdfWriter.GetInstance(oDo开发者_StackOverflowcument, output)
        Dim oFont As New Font(Font.FontFamily.TIMES_ROMAN, 8, Font.NORMAL, BaseColor.BLACK)

        Using output
            Using writer
                Using oDocument
                    oDocument.Open()
                    Using sr As New StringReader(vsHTML)
                        Using worker As New html.simpleparser.HTMLWorker(oDocument)

                            worker.StartDocument()
                            worker.SetInsidePRE(True)
                            worker.Parse(sr)
                            worker.EndDocument()
                            worker.Close()
                            oDocument.Close()

                        End Using
                    End Using

                    HttpContext.Current.Response.ContentType = "application/pdf"
                    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment;filename={0}.pdf", vsFileName))
                    HttpContext.Current.Response.BinaryWrite(output.ToArray())
                    HttpContext.Current.Response.End()

                End Using
            End Using
            output.Close()
        End Using


    End Sub

There's a glitch in iText and iTextSharp but you can fix it pretty easily if you don't mind downloading the source and recompiling it. You need to make a change to two files. Any changes I've made are commented inline in the code. Line numbers are based on the 5.1.2.0 code rev 240

The first is in iTextSharp.text.html.HtmlUtilities.cs. Look for the function EliminateWhiteSpace at line 249 and change it to:

    public static String EliminateWhiteSpace(String content) {
        // multiple spaces are reduced to one,
        // newlines are treated as spaces,
        // tabs, carriage returns are ignored.
        StringBuilder buf = new StringBuilder();
        int len = content.Length;
        char character;
        bool newline = false;
        bool space = false;//Detect whether we have written at least one space already
        for (int i = 0; i < len; i++) {
            switch (character = content[i]) {
            case ' ':
                if (!newline && !space) {//If we are not at a new line AND ALSO did not just append a space
                    buf.Append(character);
                    space = true;  //flag that we just wrote a space
                }
                break;
            case '\n':
                if (i > 0) {
                    newline = true;
                    buf.Append(' ');
                }
                break;
            case '\r':
                break;
            case '\t':
                break;
            default:
                newline = false;
                space = false;  //reset flag
                buf.Append(character);
                break;
            }
        }
        return buf.ToString();
    }

The second change is in iTextSharp.text.xml.simpleparser.SimpleXMLParser.cs. In the function Go at line 185 change line 248 to:

if (html /*&& nowhite*/) {//removed the nowhite check from here because that should be handled by the HTML parser later, not the XML parser

Thanks for the help everyone. I was able to find a small work around by doing the following:

vsHTML.Replace("  ", "&nbsp;&nbsp;").Replace(Chr(9), "&nbsp;&nbsp;&nbsp;&nbsp;").Replace(Chr(160), "&nbsp;").Replace(vbCrLf, "<br />")

The actual code does not display properly but, the first replace is replacing white spaces with  , Chr(9) with 5  , and Chr(160) with  .

I would recommend using wkhtmltopdf instead of iText. wkhtmltopdf will output the html exactly as rendered by webkit (Google Chrome, Safari) instead of iText's conversion. It is just a binary that you can call. That being said, I might check the html to ensure that there are paragraphs and/or line breaks in the user input. They might be stripped out before the conversion.

继续阅读：itext pdf

iTextSharp HTML to PDF preserving spaces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？