Easiest way of porting html table data to readable document

2023-04-04 17:29 问答作者：

Ok,

开发者_Python百科For the past 6 months i've been struggeling to build a system that allows user input in form of big sexy textareas(with loads of support for tables,list etc). Pretty much enables the user to input data as if it were word. However when wanting to export all this data I haven't been able to find a working solution...

My first step was to try and find a reporting software that did support raw HTML from the data source and render it as normal html, worked perfectly except that the keep together function is awful, either data is split in half(tables,lists etc) which I dont want. Or report always skips to the next page to avoid this, ending up in 15+ empty pages within the final document.

So Im looking for some kind of tip/direction to what would be the best solution to export my data into a readable document(pdf or word pref).

What I got is the following data breakdown, where data is often raw html.

-Period

--Unit

---Group

----Question

-----Data

What would be the best choice? Trying to render html to pdf or rtf? I need tips :(

And also sometimes the data is 2-3 pages long with mixed tables lists and plain text.

I would suggest that you try to keep this in the browser, and add a print stylesheet to the HTML to make it render one way on the screen and another way on paper. Adding a print stylesheet to your HTML is as easy as this:

<link rel="stylesheet" media="print" href="print.css">

You should be able to parse the input it with something like Html Agility Pack and transform it (i.e. with XSLT) to whatever output format you want.

Another option is to write HTML to the browser, but with Content-Type set to a Microsoft Word-specific variant (there are several to choose from, depending on the version of Word you're targeting) should make the browser ask if the user wants to open the page with Microsoft Word. With Word 2007 and newer you can also write Office Open XML Word directly, since it's XML-based.

The content-types you can use are:

application/msword

For binary Microsoft Word files, but should also work for HTML.

application/vnd.openxmlformats-officedocument.wordprocessingml.document

For the newer "Office Open XML" formats of Word 2007 and newer.

A solution you could use is to run an application on the server using System.Diagnostics.Process that will convert the site and save it as a PDF document.

You could use wkhtmltopdf which is an open source console program that can convert from HTML to PDF or image.

The installer for windows can be obtained from wkhtmltox-0.10.0_rc2 Windows Installer (i368).

After installing wkhtmltopdf you can copy the files in the installation folder inside your solution. You can use a setup like this in the solution:

Easiest way of porting html table data to readable document

The converted pdf's will be saved to the pdf folder.

And here is code for doing the conversion:

var wkhtmltopdfLocation = Server.MapPath("~/wkhtmltopdf/") + "wkhtmltopdf.exe";
var htmlUrl = @"http://stackoverflow.com/q/7384558/750216";
var pdfSaveLocation = "\"" + Server.MapPath("~/wkhtmltopdf/pdf/") + "question.pdf\"";

var process = new Process();
process.StartInfo.UseShellExecute = false;
process.StartInfo.CreateNoWindow = true;
process.StartInfo.FileName = wkhtmltopdfLocation;
process.StartInfo.Arguments = htmlUrl + " " + pdfSaveLocation;
process.Start();
process.WaitForExit();

The htmlUrl is the location of the page you need to convert to pdf. It is set to this stackoverflow page. :)

Its a general question, but two things come to mind the Visitor Pattern and Changing the Mime Type.

Visitor Pattern You can have two seperate rendering techniques. This would be up to your implementation.

MIME Type When the request is made write date out in the Response etc

HttpContext.Current.Response.Clear();
HttpContext.Current.Response.Charset = "utf-16";
HttpContext.Current.Response.ContentEncoding = System.Text.Encoding.GetEncoding("windows-1250");
HttpContext.Current.Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.doc", filename));
HttpContext.Current.Response.ContentType = "application/msword";
HttpContext.Current.Response.Write("-Period");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("--Unit");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("---Group");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("----Question");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("-----Data");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.End();

Here is another option, use print screens (Although it doesnt take care of scrolling, I think you should be able to build this in). This example can be expanded to meet the needs of your business, although it is a hack of sorts. You pass it a URL it generates an image.

Call like this

 protected void Page_Load(object sender, EventArgs e)
            {
                int screenWidth = Convert.ToInt32(Request["ScreenWidth"]);
                int screenHeight = Convert.ToInt32(Request["ScreenHeight"]);
                string url =        Request["Url"].ToString();
                string bitmapName = Request["BitmapName"].ToString();


            WebURLToImage webUrlToImage = new WebURLToImage()
            {
                Url = url,
                BrowserHeight = screenHeight,
                BrowserWidth = screenWidth,
                ImageHeight = 0,
                ImageWidth = 0
            };

        webUrlToImage.GenerateBitmapForUrl();
        webUrlToImage.GeneratedImage.Save(Server.MapPath("~") + @"Images\" +bitmapName + ".bmp");
    }

Generate an image from a webpage.

using System;
using System.Drawing;
using System.Windows.Forms;
using System.Threading;
using System.IO;

public class WebURLToImage
{
    public string Url { get; set; }
    public Bitmap GeneratedImage { get; private set; }
    public int ImageWidth { get; set; }
    public int ImageHeight { get; set; }
    public int BrowserWidth { get; set; }
    public int BrowserHeight { get; set; }

    public Bitmap GenerateBitmapForUrl()
    {
        ThreadStart threadStart = new ThreadStart(ImageGenerator);
        Thread thread = new Thread(threadStart);

        thread.SetApartmentState(ApartmentState.STA);
        thread.Start();
        thread.Join();
        return GeneratedImage;
    }

    private void ImageGenerator()
    {
        WebBrowser webBrowser = new WebBrowser();
        webBrowser.ScrollBarsEnabled = false;
        webBrowser.Navigate(Url);

        webBrowser.DocumentCompleted += new
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);

        while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        webBrowser.Dispose();
    }

    void webBrowser_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
    {
        WebBrowser webBrowser = (WebBrowser)sender;
        webBrowser.ClientSize = new Size(BrowserWidth, this.BrowserHeight);
        webBrowser.ScrollBarsEnabled = false;
        GeneratedImage = new Bitmap(webBrowser.Bounds.Width, webBrowser.Bounds.Height);
        webBrowser.BringToFront();

        webBrowser.DrawToBitmap(GeneratedImage, webBrowser.Bounds);

        if (ImageHeight != 0 && ImageWidth != 0)
            GeneratedImage =
(Bitmap)GeneratedImage.GetThumbnailImage(ImageWidth, ImageHeight,
null, IntPtr.Zero);
    }
}

继续阅读：asp.net-mvc ms-word pdf reporting

Easiest way of porting html table data to readable document

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？