Convert HTML to a string

2023-02-22 12:01 问答作者：

I have a string writer function which captures a HTMl and returns as a string. for example

"\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n <title>Cover Page</title>\r\n <style type=\"text/css\">\r\n html, body\r\n {\r\n\t font-family: Arial, Helvetica, sans-serif;\r\n\t font-size: 13pt;\r\n\t padding: 0px;\r\n\t margin: 0px;\r\n\t background-color: #FFFFFF;\r\n\t color: 开发者_Python百科black;\r\n\t width: 680px;\r\n }\r\n </style>\r\n</head>\r\n<body>\r\n <div>\r\n Ssotest Ssotest, \r\n </div> \r\n</body>\r\n</html>\r\n"

when I pass this to a PDF generating tool it throws an error.But when I copy the output of the String writer ( the same HTML string above") from the Locals window in VS2010 and hardcode it like

 string test ="\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n    <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n    <title>Cover Page</title>\r\n    <style type=\"text/css\">\r\n        html, body\r\n        {\r\n\t        font-family: Arial, Helvetica, sans-serif;\r\n\t        font-size: 13pt;\r\n\t        padding: 0px;\r\n\t        margin: 0px;\r\n\t        background-color: #FFFFFF;\r\n\t        color: black;\r\n\t        width: 680px;\r\n        }\r\n    </style>\r\n</head>\r\n<body>\r\n    <div>\r\n        Ssotest Ssotest, \r\n    </div> \r\n</body>\r\n</html>\r\n"

and pass to the tool it works fine. In the both cases the string is same. I wonder what makes the difference? Is that something gets converted when I copy the text and hardcode?? Any suggestions??

Just a update.. I used this code to format

 public class ReplaceString
        {
            static readonly IDictionary<string, string> m_replaceDict
                = new Dictionary<string, string>();

            const string ms_regexEscapes = @"[\a\b\f\n\r\t\v\\""]";

            public static string StringLiteral(string i_string)
            {
                return Regex.Replace(i_string, ms_regexEscapes, match);
            }

            public static string CharLiteral(char c)
            {
                return c == '\'' ? @"'\''" : string.Format("'{0}'", c);
            }

            private static string match(Match m)
            {
                string match = m.ToString();
                if (m_replaceDict.ContainsKey(match))
                {
                    return m_replaceDict[match];
                }

                throw new NotSupportedException();
            }

            static ReplaceString()
            {
                m_replaceDict.Add("\a", @"\a");
                m_replaceDict.Add("\b", @"\b");
                m_replaceDict.Add("\f", @"\f");
                m_replaceDict.Add("\n", @"\n");
                m_replaceDict.Add("\r", @"\r");
                m_replaceDict.Add("\t", @"\t");
                m_replaceDict.Add("\v", @"\v");

                m_replaceDict.Add("\\", @"\\");
                m_replaceDict.Add("\0", @"\0");

                //The SO parser gets fooled by the verbatim version 
                //of the string to replace - @"\"""
                //so use the 'regular' version
                m_replaceDict.Add("\"", "\\\"");
            }

            static void Main(string[] args)
            {

                string s = "here's a \"\n\tstring\" to test";
                Console.WriteLine(ReplaceString.StringLiteral(s));
                Console.WriteLine(ReplaceString.CharLiteral('c'));
                Console.WriteLine(ReplaceString.CharLiteral('\''));

            }
        }

but the string gets returned like

\\r\\n\\r\\n<!DOCTYPE html PUBLIC \\\"-//W3C//DTD XHTML 1.0 Transitional//EN\\\" \\\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\\\">\\r\\n\\r\\n<html xmlns=\\\"http://www.w3.org/1999/xhtml\\\" >\\r\\n<head>\\r\\n    <link rel=\\\"Stylesheet\\\...."

which dosent make sense.. the code of PDF generator I am using

string test="\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n <title>Cover Page</title>\r\n <style type=\"text/css\">\r\n html, body\r\n {\r\n\t font-family: Arial, Helvetica, sans-serif;\r\n\t font-size: 13pt;\r\n\t padding: 0px;\r\n\t margin: 0px;\r\n\t background-color: #FFFFFF;\r\n\t color: black;\r\n\t width: 680px;\r\n }\r\n </style>\r\n</head>\r\n<body>\r\n <div>\r\n Ssotest Ssotest, \r\n </div> \r\n</body>\r\n</html>\r\n"

           FileStreamResponseContext response = new FileStreamResponseContext();
 Document doc = new Document();
            doc.DocumentInformation.CreationDate = DateTime.Now;
            doc.DocumentInformation.Title = "Test Plan";
            doc.DocumentInformation.Subject = "Test Plan";
            doc.CompressionLevel = CompressionLevel.NormalCompression;
            doc.Margins = new Margins(0, 0, 0, 0);
            doc.Security.CanPrint = true;
            doc.ViewerPreferences.HideToolbar = false;
            doc.ViewerPreferences.FitWindow = false;

string baseUrl = String.Format("http://localhost{0}/", Request.Url.Port == 80?"":":" + Request.Url.Port.ToString());

PdfPage docTestPlan = doc.AddPage(PageSize.Letter, new Margins(0, 0, 0, 0), PageOrientation.Portrait);
// passing the string test returned from the string writer

   HtmlToPdfElement htmlToPdf = new HtmlToPdfElement(test, baseUrl);
            htmlToPdf.FitWidth = false;
            docTestPlan.AddElement(htmlToPdf);



            /******************************************
             * put doc in a memory stream for return */
            response.FileDataStream = new MemoryStream();
            doc.Save(response.FileDataStream);
            doc.Close();
            response.FileDataStream.Position = 0;

            return new FileStreamResult(response.FileDataStream, "application/pdf");

Not sure about the software you are using, but to me it looks like you are putting the string into quotes in some programmatic context where things are being escaped.

So when the tool gets your original copied input, it sees: \"

But when it gets it from the programming context, it just sees: "

After a long struggle I found the reason. The error is caused because when the string is passed from the programming context it keeps on adding to System.Web.HttpResponseBase Response object. when I pass the string directly by hard coding it is not messing again with System.Web.HttpResponseBase Response object. So the final solution is to add a piece of code Response.clear(); which clears all the previous Response objects. Now its working fine. Thanks all for your suggestions. cheers!!

I'm not sure your passing the value correctly to the PDF API. Can you update with that code too.

EDIT: Shouldn't you be returning the doc itself? That will have the header info not the stream wont it?

HTML STRING EXAMPLE:

StringWriter sw = new StringWriter();
        Server.Execute("PageToConvert.aspx", sw);
        string htmlCodeToConvert = sw.GetStringBuilder().ToString();

protected void btnExport_Click(object sender, EventArgs e) { HtmlForm form = new HtmlForm(); // form.Controls.Add(GridView1); StringWriter sw = new StringWriter(); HtmlTextWriter hTextWriter = new HtmlTextWriter(sw); //form.Controls[0].RenderControl(hTextWriter); string htmlDisplayText = @"

<html>
<body bgcolor="red">
<h4>Dear bishnu2</h4>

your address pdp is

An early version of the patterns was workshopped at PLoP After several internal workshops and updates, a later version was

workshopped at PLoP The patterns are now mature enough that I teach a class based on the patterns at AG Communication

</body></html>

"; // string htmlDisplayText = sw.ToString(); Document Doc = new Document();

    //PdfWriter.GetInstance
    //(Doc, new FileStream(Request.PhysicalApplicationPath 
    //+ "\\AmitJain.pdf", FileMode.Create));

    PdfWriter.GetInstance(Doc, new FileStream(Environment.GetFolderPath
    (Environment.SpecialFolder.Desktop)+ "\\AmitJain.pdf", FileMode.Create));
    Doc.Open();



    System.Xml.XmlTextReader xmlReader = 
    new System.Xml.XmlTextReader(new StringReader(htmlDisplayText));
    HtmlParser.Parse(Doc, xmlReader);

    Doc.Close();
    string Path = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)+ "\\AmitJain.pdf";



    ShowPdf(Path);


}

private void ShowPdf(string strS)
{
    Response.ClearContent();
    Response.ClearHeaders();
    Response.ContentType = "application/pdf";
    Response.AddHeader("Content-Disposition","attachment; filename=" + strS);
    Response.TransmitFile(strS);
    Response.End();
    //Response.WriteFile(strS);
    Response.Flush();
    Response.Clear();

}

继续阅读：asp.net-mvc c#-4.0

Convert HTML to a string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？