开发者

Convert HTML to a string

I have a string writer function which captures a HTMl and returns as a string. for example

"\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n <title>Cover Page</title>\r\n <style type=\"text/css\">\r\n html, body\r\n {\r\n\t font-family: Arial, Helvetica, sans-serif;\r\n\t font-size: 13pt;\r\n\t padding: 0px;\r\n\t margin: 0px;\r\n\t background-color: #FFFFFF;\r\n\t color: 开发者_Python百科black;\r\n\t width: 680px;\r\n }\r\n </style>\r\n</head>\r\n<body>\r\n <div>\r\n Ssotest Ssotest, \r\n </div> \r\n</body>\r\n</html>\r\n"

when I pass this to a PDF generating tool it throws an error.But when I copy the output of the String writer ( the same HTML string above") from the Locals window in VS2010 and hardcode it like

 string test ="\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n    <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n    <title>Cover Page</title>\r\n    <style type=\"text/css\">\r\n        html, body\r\n        {\r\n\t        font-family: Arial, Helvetica, sans-serif;\r\n\t        font-size: 13pt;\r\n\t        padding: 0px;\r\n\t        margin: 0px;\r\n\t        background-color: #FFFFFF;\r\n\t        color: black;\r\n\t        width: 680px;\r\n        }\r\n    </style>\r\n</head>\r\n<body>\r\n    <div>\r\n        Ssotest Ssotest, \r\n    </div> \r\n</body>\r\n</html>\r\n"

and pass to the tool it works fine. In the both cases the string is same. I wonder what makes the difference? Is that something gets converted when I copy the text and hardcode?? Any suggestions??

Just a update.. I used this code to format

 public class ReplaceString
        {
            static readonly IDictionary<string, string> m_replaceDict
                = new Dictionary<string, string>();

            const string ms_regexEscapes = @"[\a\b\f\n\r\t\v\\""]";

            public static string StringLiteral(string i_string)
            {
                return Regex.Replace(i_string, ms_regexEscapes, match);
            }

            public static string CharLiteral(char c)
            {
                return c == '\'' ? @"'\''" : string.Format("'{0}'", c);
            }

            private static string match(Match m)
            {
                string match = m.ToString();
                if (m_replaceDict.ContainsKey(match))
                {
                    return m_replaceDict[match];
                }

                throw new NotSupportedException();
            }

            static ReplaceString()
            {
                m_replaceDict.Add("\a", @"\a");
                m_replaceDict.Add("\b", @"\b");
                m_replaceDict.Add("\f", @"\f");
                m_replaceDict.Add("\n", @"\n");
                m_replaceDict.Add("\r", @"\r");
                m_replaceDict.Add("\t", @"\t");
                m_replaceDict.Add("\v", @"\v");

                m_replaceDict.Add("\\", @"\\");
                m_replaceDict.Add("\0", @"\0");

                //The SO parser gets fooled by the verbatim version 
                //of the string to replace - @"\"""
                //so use the 'regular' version
                m_replaceDict.Add("\"", "\\\"");
            }

            static void Main(string[] args)
            {

                string s = "here's a \"\n\tstring\" to test";
                Console.WriteLine(ReplaceString.StringLiteral(s));
                Console.WriteLine(ReplaceString.CharLiteral('c'));
                Console.WriteLine(ReplaceString.CharLiteral('\''));

            }
        }

but the string gets returned like

\\r\\n\\r\\n<!DOCTYPE html PUBLIC \\\"-//W3C//DTD XHTML 1.0 Transitional//EN\\\" \\\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\\\">\\r\\n\\r\\n<html xmlns=\\\"http://www.w3.org/1999/xhtml\\\" >\\r\\n<head>\\r\\n    <link rel=\\\"Stylesheet\\\...."

which dosent make sense.. the code of PDF generator I am using

string test="\r\n\r\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\" >\r\n<head>\r\n <link rel=\"Stylesheet\" href=\"../../Content/style.css\" type=\"text/css\" />\r\n <title>Cover Page</title>\r\n <style type=\"text/css\">\r\n html, body\r\n {\r\n\t font-family: Arial, Helvetica, sans-serif;\r\n\t font-size: 13pt;\r\n\t padding: 0px;\r\n\t margin: 0px;\r\n\t background-color: #FFFFFF;\r\n\t color: black;\r\n\t width: 680px;\r\n }\r\n </style>\r\n</head>\r\n<body>\r\n <div>\r\n Ssotest Ssotest, \r\n </div> \r\n</body>\r\n</html>\r\n"

           FileStreamResponseContext response = new FileStreamResponseContext();
 Document doc = new Document();
            doc.DocumentInformation.CreationDate = DateTime.Now;
            doc.DocumentInformation.Title = "Test Plan";
            doc.DocumentInformation.Subject = "Test Plan";
            doc.CompressionLevel = CompressionLevel.NormalCompression;
            doc.Margins = new Margins(0, 0, 0, 0);
            doc.Security.CanPrint = true;
            doc.ViewerPreferences.HideToolbar = false;
            doc.ViewerPreferences.FitWindow = false;

string baseUrl = String.Format("http://localhost{0}/", Request.Url.Port == 80?"":":" + Request.Url.Port.ToString());

PdfPage docTestPlan = doc.AddPage(PageSize.Letter, new Margins(0, 0, 0, 0), PageOrientation.Portrait);
// passing the string test returned from the string writer

   HtmlToPdfElement htmlToPdf = new HtmlToPdfElement(test, baseUrl);
            htmlToPdf.FitWidth = false;
            docTestPlan.AddElement(htmlToPdf);



            /******************************************
             * put doc in a memory stream for return */
            response.FileDataStream = new MemoryStream();
            doc.Save(response.FileDataStream);
            doc.Close();
            response.FileDataStream.Position = 0;

            return new FileStreamResult(response.FileDataStream, "application/pdf");


Not sure about the software you are using, but to me it looks like you are putting the string into quotes in some programmatic context where things are being escaped.

So when the tool gets your original copied input, it sees: \"

But when it gets it from the programming context, it just sees: "


After a long struggle I found the reason. The error is caused because when the string is passed from the programming context it keeps on adding to System.Web.HttpResponseBase Response object. when I pass the string directly by hard coding it is not messing again with System.Web.HttpResponseBase Response object. So the final solution is to add a piece of code Response.clear(); which clears all the previous Response objects. Now its working fine. Thanks all for your suggestions. cheers!!


I'm not sure your passing the value correctly to the PDF API. Can you update with that code too.

EDIT: Shouldn't you be returning the doc itself? That will have the header info not the stream wont it?

HTML STRING EXAMPLE:

StringWriter sw = new StringWriter();
        Server.Execute("PageToConvert.aspx", sw);
        string htmlCodeToConvert = sw.GetStringBuilder().ToString();


protected void btnExport_Click(object sender, EventArgs e) { HtmlForm form = new HtmlForm(); // form.Controls.Add(GridView1); StringWriter sw = new StringWriter(); HtmlTextWriter hTextWriter = new HtmlTextWriter(sw); //form.Controls[0].RenderControl(hTextWriter); string htmlDisplayText = @"


<html>
<body bgcolor="red">
<h4>Dear bishnu2</h4>

 your address pdp    is

An early version of the patterns was workshopped at PLoP After several internal workshops and updates, a later version was

workshopped at PLoP  The patterns are now mature enough that I teach a class based on the patterns at AG Communication

Systems.
                    Copyright © 1999 AG Communication Systems Corporation

</body></html>

"; // string htmlDisplayText = sw.ToString(); Document Doc = new Document();

    //PdfWriter.GetInstance
    //(Doc, new FileStream(Request.PhysicalApplicationPath 
    //+ "\\AmitJain.pdf", FileMode.Create));

    PdfWriter.GetInstance(Doc, new FileStream(Environment.GetFolderPath
    (Environment.SpecialFolder.Desktop)+ "\\AmitJain.pdf", FileMode.Create));
    Doc.Open();



    System.Xml.XmlTextReader xmlReader = 
    new System.Xml.XmlTextReader(new StringReader(htmlDisplayText));
    HtmlParser.Parse(Doc, xmlReader);

    Doc.Close();
    string Path = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)+ "\\AmitJain.pdf";



    ShowPdf(Path);


}

private void ShowPdf(string strS)
{
    Response.ClearContent();
    Response.ClearHeaders();
    Response.ContentType = "application/pdf";
    Response.AddHeader("Content-Disposition","attachment; filename=" + strS);
    Response.TransmitFile(strS);
    Response.End();
    //Response.WriteFile(strS);
    Response.Flush();
    Response.Clear();

}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜