c# Stringbuilder: persisting a StringBuilder object into a varchar column - SQL Server
I have block of text read from a PDF document, using the ItextSharp library(method: GetResultantText())
Consider the text is outlined/formatted in paragraphs:
*"Paragraph One.
Paragraph Two. ...
Paragraph n "*
Is there a way to use the C# StringBuilder object, or perhaps an alternate approach, to store the text while retaining the fomatting?: contains carriage returns and paragraphs etc. and store the value in a varchar field in SQL Server 08.
Ultimately I intend storing the text into a varchar field and would like to retain the line feeds, carriage return [basic fomatting metadata], otherwise the extracted text is a single block of text that isn't readabe when rendered.
I reckon invoking the toString() method on a StringBulder object removes all intermediate formatting characters in a text excecpt the terminating [newlinecharacter].
SimpleTextExtractionStrategy strategy;
//StreamWriter writer = new StreamWriter("c:\\pdfOutput.txt");
for (int i = 1; i <= reader.NumberOfPages; i++)
{
try
{
strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy());
buffer.AppendLine(strategy.GetResultantText());
//writer.WriteLine(strategy.GetResultantText());
}
catch (IndexOutOfRangeException e) { }
}
pdfText = buffer.ToString();
Console.WriteLine("* End: Text Extraction Process ...");
return pdfText = buffer.ToString();
If I uncomment and output to a text file, the fomatting is retained. However if I save the resulting text into and entity defined as: All i get is a single block of text:
[System.Data.Linq.Mapping.Table(Name = "ReportsText")]
public class ReportsText开发者_运维技巧
{
[Column (IsDbGenerated = true, AutoSync=AutoSync.OnInsert)]
public int ID { get; set; }
[Column(IsPrimaryKey = true, AutoSync = AutoSync.OnInsert)]
public String image { get; set; }
[Column] public String announcement { get; set; }
}
So pdfText is inteded to be stored into the annouuncement field. Cheers.
I dont think that it should remove formatting and if it doing so Make use of "\r\n
" after each paragraph and than store it.
You are correct in saying that using StringBuilder
in itself will remove formatting, and will retain only new line characters. If you really want to store a string with formatting information into the database, I would suggest storing it as a pre-defined format--like XML, RTF or even HTML, then retrieve it the same way in order to be fed to iTextSharp.
Another way I can think of is to generate the PDFs directly then store the binary stream into the database as nText
or clob. This is not the best practice though.
精彩评论