Problem using libcurl: it does not appear to get the entire page

2023-01-13 07:09 问答作者：

I am having difficulty getting started with libcurl. The code below does not appear to retrieve the entire page from the specified URL. Where am I going wrong?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include &l开发者_StackOverflow社区t;curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

using namespace std;

char buffer[1024];

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream)
{
    strncpy(buffer,ptr,size*nmemb);
    return size*nmemb;
}

int main() {
    CURL *curl;
    CURLcode res;


    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, "http://google.co.in");
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION,1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &tobuffer);

        res = curl_easy_perform(curl);

        printf("%s",buffer);

        curl_easy_cleanup(curl);
    }
    return 0;
}

As seen at the libcurl documentation for curl_easy_setopt(), the callback function is called as many times as required to deliver all the bytes of the fetched page.

Your function overwrites the same buffer on every call, with the result that after curl_easy_perform() has finished fetching the file, you only have whatever fit in the final call to tobuffer() left.

In short, your function tobuffer() must do something other than overwrite the same buffer on each call.

update

For example, you could do something like the following completely untested code:

struct buf {
    char *buffer;
    size_t bufferlen;
    size_t writepos;
} buffer = {0};

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream)
{
    size_t nbytes = size*nmemb;
    if (!buffer.buffer) {
        buffer.buffer = malloc(1024);
        buffer.bufferlen = 1024;
        buffer.writepos = 0;
    }
    if (buffer.writepos + nbytes < buffer.bufferlen) {
        buffer.bufferlen = 2 * buffer.bufferlen;
        buffer.buffer = realloc(buffer, buffer.bufferlen);
    }
    assert(buffer.buffer != NULL);
    memcpy(buffer.buffer+buffer.writepos,ptr,nbytes);
    return nbytes;
}

At some later point in your program you will need to free the allocated memory something like this:

void freebuffer(struct buf *b) {
    free(b->buffer);
    b->buffer = NULL;
    b->bufferlen = 0;
    b->writepos = 0;
}

Also, note that I've used memcpy() instead of strncpy() to move data to the buffer. This is important because libcurl makes no claim that the data passed to the callback function is actually a NUL terminated ASCII string. In particular, if you retrieve a .gif image file, it certainly can (and will) contain zero bytes in the file which you would want to preserve in your buffer. strncpy() will stop copying after the first NUL it sees in the source data.

As an exercise for the reader, I've left all the error handling out of this code. You must put some in. Furthermore, I've also left in a juicy memory leak on the off chance that the call to realloc() fails.

Another improvement would be to make use of the option that allows the value of the stream parameter to the callback to come from the libcurl caller. That could be used to allocate manage your buffer without using global variables. I'd strongly recommend doing that as well.

char buffer[1024];

How could you get the entire webpage when your buffer size is limited to 1024 ?

You are performing a Simple get operation using libcurl. You can use this sample program as reference. Why dont you print the buffer in the callback or write to a file as shown in this example?

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

static size_t write_data(void *ptr, size_t size, size_t nmemb, void *stream)
{
  int written = fwrite(ptr, size, nmemb, (FILE *)stream);
  return written;
}

int main(int argc, char **argv)
{
  CURL *curl_handle;
  static const char *headerfilename = "head.out";
  FILE *headerfile;
  static const char *bodyfilename = "body.out";
  FILE *bodyfile;

  curl_global_init(CURL_GLOBAL_ALL);

  /* init the curl session */ 
  curl_handle = curl_easy_init();

  /* set URL to get */ 
  curl_easy_setopt(curl_handle, CURLOPT_URL, "http://curl.haxx.se");

  /* no progress meter please */ 
  curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 1L);

  /* send all data to this function  */ 
  curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, write_data);

  /* open the files */ 
  headerfile = fopen(headerfilename,"w");
  if (headerfile == NULL) {
    curl_easy_cleanup(curl_handle);
    return -1;
  }
  bodyfile = fopen(bodyfilename,"w");
  if (bodyfile == NULL) {
    curl_easy_cleanup(curl_handle);
    return -1;
  }

  /* we want the headers to this file handle */ 
  curl_easy_setopt(curl_handle,   CURLOPT_WRITEHEADER, headerfile);

  /*
   * Notice here that if you want the actual data sent anywhere else but
   * stdout, you should consider using the CURLOPT_WRITEDATA option.  */ 

  /* get it! */ 
  curl_easy_perform(curl_handle);

  /* close the header file */ 
  fclose(headerfile);

  /* cleanup curl stuff */ 
  curl_easy_cleanup(curl_handle);

  return 0;
}

Tip: Use a stringstream! Just replace your buffer with a stringstream and output the content by: (string)<streamname>.str() Works for me!!!

I don't know the library, but it seems to me that you're reusing the buffer... if the page you download doesn't fit then you'll write over it repeatedly, and probably only see the last snippet. For example, if we copy the alphabet into a 10 character buffer, we get:

ABCDEFGHIJ - first copy stores this
KLMNOPQRST - second copy stores this
UVWXYZ     - third copy stores this

Depending on whether the data size reported includes a terminating 0/NUL character, the buffer may be seen as UVWXYZ (which printf(%s) will interpret as "UVWXYZ"), or as "UVWXYZQRST" (printf(%s) would keep trying to print past the end of the buffer until it just happens to find a 0/NUL).

res = curl_easy_perform(curl) strongly suggests it's giving you a result/error-code, have you bothered to check what the value is and what the documentation says that means?

You really should learn to diagnose these kind of things yourself too... you would have found the suspected issue if instead of copying into buffer, you put a std::cout statement into your callback to show you the data and how many times it's called. Break things down until you find the issue.

you seem to be missing the CURLOPT_WRITEDATA option. which passes the first argument to WRITEFUNCION to_buffer(char *ptr...

curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);

继续阅读：c libcurl

Problem using libcurl: it does not appear to get the entire page

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？