开发者

How to get parsed output from libtidy into char *

I'm trying to shoe-horn libtidy into a C++ program with minimal re-work. The C++ program needs the resulting (cleansed) HTML in a char *. I'm using the libtidy example code but trying to use tidySaveString rather than tidySaveBuffer which wants to use libtidy's own buffer.

Problem 1 is I can't seem to find a (sensible) way to determine the size I need to allocate for my buffer, nothing obvious seems apparent in the libtidy docs.

problem 2 is that when I use a non-sensible method to get the size (put it out to a tidyBuffer and get the size of that) and then allocate my memory and call tidySaveString, I always get a -ENOMEM error.

heres the adapted code i'm using:

.
.
.
char *buffer_;
char *cleansed_buffer_; 
.
.
.
int ProcessHtml::Clean(){
// uses Libtidy to convert the buffer to XML


TidyBuffer output = {0};
TidyBuffer errbuf = {0};
int rc = -1;
Bool ok;

TidyDoc tdoc = tidyCreate();                     // Initialize "document"


ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes );  // Convert to XHTML
if ( ok )
    rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
if ( rc >= 0 )
    rc = tidyParseString( tdoc, this->buffer_ );           // Parse the input
if ( rc >= 0 )
    rc = tidyCleanAndRepair( tdoc );               // Tidy it up!
if ( rc >= 0 )
    rc = tidyRunDiagnostics( tdoc );               // Kvetch
if ( rc > 1 )                                    // If error, force output.
    rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
if ( rc >= 0 ){
    rc = tidySaveBuffer( tdoc, &output );          // Pretty Print

    // get some mem
    uint yy = output.size;
    cleansed_buffer_ = (char *)malloc(yy+10);
    uint xx = 0;
    rc = tidySaveString(tdoc, this->cleansed_buffer_,&xx );
    if (rc == -ENOMEM)
        cout << "yikes!!\n" << endl;

}
if ( rc >= 0 )
{
    if ( rc > 0 )
        printf( "\nDiagnostics:\n\n%s", errbuf.bp );
    printf( "\nAnd here is the result:\n\n%s", cleansed_buffer_ );
}
else
    printf( "A severe error (%d) occurred.\n", rc );

tidyBufFree( &output );
tidyBufFree( &errbuf );
tidyRelease( tdoc );
return rc;

}

Its reading the bytes to clean from an input buffer (buffer_) 开发者_StackOverflowand I really need the output in (cleansed_buffer_). Ideally (obviously) I don't want to dump out the doc to an output buffer just so I can get the size - but also , I need to find a way to get this to work.

All help gratefully received..


You have to pass in the buffer size...

uint yy = output.size;
cleansed_buffer_ = (char *)malloc(yy+10);
uint xx = yy+10;   /* <---------------------------------- HERE */
rc = tidySaveString(tdoc, this->cleansed_buffer_,&xx );
if (rc == -ENOMEM)
    cout << "yikes!!\n" << endl;

Alternativally, you can get the size this way:

cleansed_buffer_ = (char *)malloc(1);
uint size = 0
rc = tidySaveString(tdoc, cleansed_buffer_, &size );

// now size is the required size
free(cleansed_buffer_);
cleansed_buffer_ = (char *)malloc(size+1);
rc = tidySaveString(tdoc, cleansed_buffer_, &size );
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜