How to get parsed output from libtidy into char *
I'm trying to shoe-horn libtidy into a C++ program with minimal re-work. The C++ program needs the resulting (cleansed) HTML in a char *. I'm using the libtidy example code but trying to use tidySaveString rather than tidySaveBuffer which wants to use libtidy's own buffer.
Problem 1 is I can't seem to find a (sensible) way to determine the size I need to allocate for my buffer, nothing obvious seems apparent in the libtidy docs.
problem 2 is that when I use a non-sensible method to get the size (put it out to a tidyBuffer and get the size of that) and then allocate my memory and call tidySaveString, I always get a -ENOMEM error.
heres the adapted code i'm using:
.
.
.
char *buffer_;
char *cleansed_buffer_;
.
.
.
int ProcessHtml::Clean(){
// uses Libtidy to convert the buffer to XML
TidyBuffer output = {0};
TidyBuffer errbuf = {0};
int rc = -1;
Bool ok;
TidyDoc tdoc = tidyCreate(); // Initialize "document"
ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); // Convert to XHTML
if ( ok )
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
if ( rc >= 0 )
rc = tidyParseString( tdoc, this->buffer_ ); // Parse the input
if ( rc >= 0 )
rc = tidyCleanAndRepair( tdoc ); // Tidy it up!
if ( rc >= 0 )
rc = tidyRunDiagnostics( tdoc ); // Kvetch
if ( rc > 1 ) // If error, force output.
rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
if ( rc >= 0 ){
rc = tidySaveBuffer( tdoc, &output ); // Pretty Print
// get some mem
uint yy = output.size;
cleansed_buffer_ = (char *)malloc(yy+10);
uint xx = 0;
rc = tidySaveString(tdoc, this->cleansed_buffer_,&xx );
if (rc == -ENOMEM)
cout << "yikes!!\n" << endl;
}
if ( rc >= 0 )
{
if ( rc > 0 )
printf( "\nDiagnostics:\n\n%s", errbuf.bp );
printf( "\nAnd here is the result:\n\n%s", cleansed_buffer_ );
}
else
printf( "A severe error (%d) occurred.\n", rc );
tidyBufFree( &output );
tidyBufFree( &errbuf );
tidyRelease( tdoc );
return rc;
}
Its reading the bytes to clean from an input buffer (buffer_) 开发者_StackOverflowand I really need the output in (cleansed_buffer_). Ideally (obviously) I don't want to dump out the doc to an output buffer just so I can get the size - but also , I need to find a way to get this to work.
All help gratefully received..
You have to pass in the buffer size...
uint yy = output.size;
cleansed_buffer_ = (char *)malloc(yy+10);
uint xx = yy+10; /* <---------------------------------- HERE */
rc = tidySaveString(tdoc, this->cleansed_buffer_,&xx );
if (rc == -ENOMEM)
cout << "yikes!!\n" << endl;
Alternativally, you can get the size this way:
cleansed_buffer_ = (char *)malloc(1);
uint size = 0
rc = tidySaveString(tdoc, cleansed_buffer_, &size );
// now size is the required size
free(cleansed_buffer_);
cleansed_buffer_ = (char *)malloc(size+1);
rc = tidySaveString(tdoc, cleansed_buffer_, &size );
精彩评论