开发者

Problem using libcurl: it does not appear to get the entire page

I am having difficulty getting started with libcurl. The code below does not appear to retrieve the entire page from the specified URL. Where am I going wrong?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include &l开发者_StackOverflow社区t;curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

using namespace std;

char buffer[1024];

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream)
{
    strncpy(buffer,ptr,size*nmemb);
    return size*nmemb;
}

int main() {
    CURL *curl;
    CURLcode res;


    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, "http://google.co.in");
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION,1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &tobuffer);

        res = curl_easy_perform(curl);

        printf("%s",buffer);

        curl_easy_cleanup(curl);
    }
    return 0;
}


As seen at the libcurl documentation for curl_easy_setopt(), the callback function is called as many times as required to deliver all the bytes of the fetched page.

Your function overwrites the same buffer on every call, with the result that after curl_easy_perform() has finished fetching the file, you only have whatever fit in the final call to tobuffer() left.

In short, your function tobuffer() must do something other than overwrite the same buffer on each call.

update

For example, you could do something like the following completely untested code:

struct buf {
    char *buffer;
    size_t bufferlen;
    size_t writepos;
} buffer = {0};

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream)
{
    size_t nbytes = size*nmemb;
    if (!buffer.buffer) {
        buffer.buffer = malloc(1024);
        buffer.bufferlen = 1024;
        buffer.writepos = 0;
    }
    if (buffer.writepos + nbytes < buffer.bufferlen) {
        buffer.bufferlen = 2 * buffer.bufferlen;
        buffer.buffer = realloc(buffer, buffer.bufferlen);
    }
    assert(buffer.buffer != NULL);
    memcpy(buffer.buffer+buffer.writepos,ptr,nbytes);
    return nbytes;
}

At some later point in your program you will need to free the allocated memory something like this:

void freebuffer(struct buf *b) {
    free(b->buffer);
    b->buffer = NULL;
    b->bufferlen = 0;
    b->writepos = 0;
}

Also, note that I've used memcpy() instead of strncpy() to move data to the buffer. This is important because libcurl makes no claim that the data passed to the callback function is actually a NUL terminated ASCII string. In particular, if you retrieve a .gif image file, it certainly can (and will) contain zero bytes in the file which you would want to preserve in your buffer. strncpy() will stop copying after the first NUL it sees in the source data.

As an exercise for the reader, I've left all the error handling out of this code. You must put some in. Furthermore, I've also left in a juicy memory leak on the off chance that the call to realloc() fails.

Another improvement would be to make use of the option that allows the value of the stream parameter to the callback to come from the libcurl caller. That could be used to allocate manage your buffer without using global variables. I'd strongly recommend doing that as well.


char buffer[1024];

How could you get the entire webpage when your buffer size is limited to 1024 ?


You are performing a Simple get operation using libcurl. You can use this sample program as reference. Why dont you print the buffer in the callback or write to a file as shown in this example?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

static size_t write_data(void *ptr, size_t size, size_t nmemb, void *stream)
{
  int written = fwrite(ptr, size, nmemb, (FILE *)stream);
  return written;
}

int main(int argc, char **argv)
{
  CURL *curl_handle;
  static const char *headerfilename = "head.out";
  FILE *headerfile;
  static const char *bodyfilename = "body.out";
  FILE *bodyfile;

  curl_global_init(CURL_GLOBAL_ALL);

  /* init the curl session */ 
  curl_handle = curl_easy_init();

  /* set URL to get */ 
  curl_easy_setopt(curl_handle, CURLOPT_URL, "http://curl.haxx.se");

  /* no progress meter please */ 
  curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 1L);

  /* send all data to this function  */ 
  curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, write_data);

  /* open the files */ 
  headerfile = fopen(headerfilename,"w");
  if (headerfile == NULL) {
    curl_easy_cleanup(curl_handle);
    return -1;
  }
  bodyfile = fopen(bodyfilename,"w");
  if (bodyfile == NULL) {
    curl_easy_cleanup(curl_handle);
    return -1;
  }

  /* we want the headers to this file handle */ 
  curl_easy_setopt(curl_handle,   CURLOPT_WRITEHEADER, headerfile);

  /*
   * Notice here that if you want the actual data sent anywhere else but
   * stdout, you should consider using the CURLOPT_WRITEDATA option.  */ 

  /* get it! */ 
  curl_easy_perform(curl_handle);

  /* close the header file */ 
  fclose(headerfile);

  /* cleanup curl stuff */ 
  curl_easy_cleanup(curl_handle);

  return 0;
}


Tip: Use a stringstream! Just replace your buffer with a stringstream and output the content by: (string)<streamname>.str() Works for me!!!


I don't know the library, but it seems to me that you're reusing the buffer... if the page you download doesn't fit then you'll write over it repeatedly, and probably only see the last snippet. For example, if we copy the alphabet into a 10 character buffer, we get:

ABCDEFGHIJ - first copy stores this
KLMNOPQRST - second copy stores this
UVWXYZ     - third copy stores this

Depending on whether the data size reported includes a terminating 0/NUL character, the buffer may be seen as UVWXYZ (which printf(%s) will interpret as "UVWXYZ"), or as "UVWXYZQRST" (printf(%s) would keep trying to print past the end of the buffer until it just happens to find a 0/NUL).

res = curl_easy_perform(curl) strongly suggests it's giving you a result/error-code, have you bothered to check what the value is and what the documentation says that means?

You really should learn to diagnose these kind of things yourself too... you would have found the suspected issue if instead of copying into buffer, you put a std::cout statement into your callback to show you the data and how many times it's called. Break things down until you find the issue.


you seem to be missing the CURLOPT_WRITEDATA option. which passes the first argument to WRITEFUNCION to_buffer(char *ptr...

curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜