getting error while Converting the Html Contenttpo Word Document

2023-04-11 10:31 问答作者：

Hello every one i am using HTml Agility and Openxml to convert my html content to word file content.

<div>
<div id="container">
<div>
<div>
<!--content starts here//-->
<form name="questions" method="post">
<img src="../../content/0/Static UPload/Divya_3LevelLeftMenu_Operating System v8.0 English/unit9/lesson27/../../images/less_title_27.jpg" width="750" height="75">
<div id="title">Exercise
<table border="0" cellspacing="20" cellpadding="0">
  <tr>
    <td><b> Student's Name:&nbsp;</b><br>
      <input type="text" name="b1" size="45"></td>
    <td><b>Class:</b><br>
      <input type="text" name="b2" size="45"></td>
  </tr>
</table>
<td width="176" align="left">&nbsp;</td>
    <tr><td width="779" align="left">&nbsp;</td>
    </tr>
       <ol>
      <li>Describe the purpose of Windows Update. 
      <p align="left"><textarea name="a1" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
      </li>
    </ol>

    <ol start="2">
      <li>Explain why using Windows Update is critical to maintaining an operating system.
        <p align="left"><textarea name="a2" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
      </li>
    </ol>
    <ol start="3">
      <li>Summarize the process used to access and install Windows Updates.  
        <p align="left"><textarea name="a3" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
      </li>
    </ol>
    <ol start="4">
      <li>Compare and contrast using Windows Update and using a Windows Service Pack. 
        <p align="left"><textarea name="a4" rows="10" wrap="VIRTUAL" cols="55"></textarea></p>
      </li>
    </ol>
    <center><p><b>Note: You must print your completed exercise
    to submit to your instructor.</b><br>
    <b class="style1"><u>Do Not&l开发者_如何学Got;/u></b> close this window without printing your exercise or your answers will be lost.<br><br>
            <input onclick="reLoadMe(document.questions) " type="button" value="Print Preview">
      </p>
    </center>
</form>
    <div align="center"><a href="#top"><img src="../../content/0/Static UPload/Divya_3LevelLeftMenu_Operating System v8.0 English/unit9/lesson27/../../images/back_to_top.jpg" alt="" width="40" height="21" border="0"></a>

</div></div></div></div></div></div>

this is the html content i am using to convert. But i am getting the following error while parsing it.

   at NotesFor.HtmlToOpenXml.TableContext.get_CurrentTable()
   at NotesFor.HtmlToOpenXml.HtmlConverter.ProcessTableColumn(HtmlEnumerator en)
   at NotesFor.HtmlToOpenXml.HtmlConverter.ProcessHtmlChunks(HtmlEnumerator en, String endTag)
   at NotesFor.HtmlToOpenXml.HtmlConverter.Parse(String html)
   at WebApplication3.WebForm3.Button1_Click(Object sender, EventArgs e) in C:\Users\USER\Documents\Visual Studio 2008\Projects\Piyush_training\WebApplication3\WebForm3.aspx.cs:line 102

my code is as follows.

   using DocumentFormat.OpenXml.Drawing;
    using NotesFor.HtmlToOpenXml;
    using System.IO;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    using wp = DocumentFormat.OpenXml.Drawing.Wordprocessing;
    using DocumentFormat.OpenXml;
    using HtmlAgilityPack;
    using System.Text;
 protected void Button1_Click(object sender, EventArgs e)
    {
        const string filename = "C:/Temp/test.docx";
        Response.ContentEncoding = System.Text.Encoding.UTF7;
        System.Text.StringBuilder SB = new System.Text.StringBuilder();
        System.IO.StringWriter SW = new System.IO.StringWriter();

string pagecontent=above html Content; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(pagecontent); if (doc == null) ; doc.OptionCheckSyntax = true; doc.OptionAutoCloseOnEnd = true; doc.OptionFixNestedTags = true; int errorCount = doc.ParseErrors.Count(); string output = "";

            doc.Save(SW);
            System.Web.UI.HtmlTextWriter htmlTW = new System.Web.UI.HtmlTextWriter(SW);
            strBody = "<html>" + "<body>" + "<div><b>" + htmlTW.InnerWriter.ToString() + "</b></div>" + "</body>" + "</html>";

            string html = strBody; 

           try
            {
                using (MemoryStream generatedDocument = new MemoryStream())
                {
                    using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
                    {
                        MainDocumentPart mainPart = package.MainDocumentPart;
                        if (mainPart == null)
                        {
                            mainPart = package.AddMainDocumentPart();
                            new Document(new Body()).Save(mainPart);
                        }

                        HtmlConverter converter = new HtmlConverter(mainPart);
                        converter.ExcludeLinkAnchor = true;
                        converter.RefreshStyles();
                        converter.ImageProcessing = ImageProcessing.AutomaticDownload;
                        Body body = mainPart.Document.Body;
                        converter.ConsiderDivAsParagraph = false;

                        var paragraphs = converter.Parse(html);
                        for (int i = 0; i < paragraphs.Count; i++)
                        {
                            body.Append(paragraphs[i]);
                        }

                        mainPart.Document.Save();
                    }

                    File.WriteAllBytes(filename, generatedDocument.ToArray());
                }

                System.Diagnostics.Process.Start(filename);
            }
            catch (Exception ex)
            {
                Response.Write(ex.ToString());
            }
        }

You might want to try a different approach for assembling your word document from HTML. Depending on your requirements you can take one of a couple of approaches:

Assemble the document using the OpenXmlSdk as you have done, or:
Use the altChunk method

altChunk, is a special feature of Open XML word processing markup that enables you to embed an entire Open XML document or an html page at a specific location in a document

Eric White has a number of blog posts describing this process, below is an extract from his article highlighting embedding html:

Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc = WordprocessingDocument.Open("Test1.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);

    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
        chunk.FeedData(fileStream);
     AltChunk altChunk = new AltChunk();
     altChunk.Id = altChunkId;
     mainPart.Document
         .Body
         .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
     mainPart.Document.Save();
 }

The whole article along with sample code (at the bottom): How to Use altChunk for Document Assembly

Use this to get content with images working.

To use the AltChunk method you have to use an existent file. Create the file dynamically with any content first, because altChunk doesn't accept a blank file.

Create a .docx file with a small content.
Append the html content.

try
{
    var domainNameURL = "yoursite.com/";
    var strBody = "<html>" + "<body>" + "<div> Word File </div>" + "</body>" + "</html>";
    using (MemoryStream generatedDocument = new MemoryStream())
    {
        using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            MainDocumentPart mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            HtmlConverter converter = new HtmlConverter(mainPart);
            converter.ExcludeLinkAnchor = true;
            converter.RefreshStyles();
            converter.ImageProcessing = ImageProcessing.AutomaticDownload;
            converter.BaseImageUrl = new Uri(domainNameURL + "Images/");

            Body body = mainPart.Document.Body;
            converter.ConsiderDivAsParagraph = false;

            var paragraphs = converter.Parse(strBody);
                for (int i = 0; i < paragraphs.Count; i++)
                {
                    body.Append(paragraphs[i]);
                }

            mainPart.Document.Save();
        }

        File.WriteAllBytes(filename, generatedDocument.ToArray());
    }

    using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
    {
        XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
        string altChunkId = "AltChunkId1";
        MainDocumentPart mainPart = myDoc.MainDocumentPart;
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId);

        using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
        using (StreamWriter stringStream = new StreamWriter(chunkStream))
            stringStream.Write(html);
        XElement altChunk = new XElement(w + "altChunk",
        new XAttribute(r + "id", altChunkId)
        );
        XDocument mainDocumentXDoc = GetXDocument(myDoc);
        mainDocumentXDoc.Root
            .Element(w + "body")
            .Elements(w + "p")
            .Last()
            .AddAfterSelf(altChunk);
        SaveXDocument(myDoc, mainDocumentXDoc);
    }
    System.Diagnostics.Process.Start(filename);
}
catch (Exception ex)
{
    Response.Write(ex.ToString());
}

I used this function to convert a huge HTML (with inline images) to Word, after reading previous answers and the one here: https://stackoverflow.com/a/18152334/1863970

public static byte[] HtmlToWord(string html)
{
    using (var generatedDocument = new MemoryStream())
    {
        using (var package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
        {
            MainDocumentPart mainPart = package.MainDocumentPart;
            if (mainPart == null)
            {
                mainPart = package.AddMainDocumentPart();
                new Document(new Body()).Save(mainPart);
            }

            HtmlConverter converter = new HtmlConverter(mainPart);
            Body body = mainPart.Document.Body;

            string altChunkId = "myId";

            var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body>" + html + "</body></html>"));

            // Create alternative format import part.
            var formatImportPart = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);

            // Feed HTML data into format import part (chunk).
            formatImportPart.FeedData(memoryStream);
            var altChunk = new AltChunk();
            altChunk.Id = altChunkId;

            mainPart.Document.Body.Append(altChunk);

            mainPart.Document.Save();
        }

        return generatedDocument.ToArray();
    }
}

继续阅读：asp.net openxml

getting error while Converting the Html Contenttpo Word Document

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？