开发者

fopen Segfault error on large files

Hello everyone I'm new to C but I've recently been getting a weird segfault error with my fopen.

   FILE* thefile = fopen(argv[1],"r");

The problem I've been having is that this code works on other smaller text files, but when I try with a file around 400MB it will give a sefault error. I've even tried hardcoding the filename but that doesn't work either. Could there be a problem in the rest of the code causing the segfault on this line?(doubt it but would like to know if its possible. It's just really odd that no errors come up for a small text file, but a large text file does get errors.

Thanks!

EDIT* didn't want to bog this down with too much but heres my code

int main(int argc, char *argv[])
{
 if(argc != 3)
{
printf("[ERROR] Invalid number of arguments. Please pass 2 arguments, input_bound_file (column  1:probe, columne 2,...: samples) and desired_output_file_name");
exit(2);
}

int i,j;
rankAvg= g_hash_table_new(g_direct_hash, g_direct_equal);
rankCnt= g_hash_table_new(g_direct_hash, g_direct_equal);
table = g_hash_table_new_full (g_direct_hash, g_direct_equal, NULL, g_free);
getCounts(argv[1]);
printf("NC=: %i       nR =: %i",nC,nR);
double srcMat[nR][nC];
int rankMat[nR][nC];
double normMat[nR][nC];
int sorts[nR][nC];
char line[100];

FILE* thefile = fopen(argv[1],"r");
printf("%s\n", strerror(errno));
FILE* output = fopen(argv[2],"w");
char* rownames[100];
i=0;j = 1;
int processedProbeNumber = 0;
int previousStamp = 0;
fgets(line,sizeof(line),thefile); //read file

while(fgets(line,sizeof(line),thefile) != NULL)
{
cleanSpace(line); //creates only one space between entries
  char dest[100];
  int len = strlen(line);
  for(i = 0; i < len; i++)
  {
    if(line[i] == ' ') //read in rownames
    {
    r开发者_开发百科ownames[j] = strncpy(dest, line, i);
    dest[i] = '\0';
    break;
    }
  }

  char* token = strtok(line, " ");
  token = strtok(NULL, " ");
  i=1;

  while(token!=NULL) //put words into array
    {
    rankMat[j][i]= abs(atof(token));
      srcMat[j][i] = abs(atof(token));
    token = strtok(NULL, " ");
    i++;
    }

    // set the first column as a row id
    j++;
  processedProbeNumber++;

    if( (processedProbeNumber-previousStamp) >= 10000)
    {
      previousStamp = processedProbeNumber;
      printf("\tnumber of loaded lines = %i",processedProbeNumber);

    }
}
printf("\ttotal number of loaded lines  = %i \n",processedProbeNumber);
fclose(thefile);


How do you know that fopen is seg faulting? If you're simply sprinkling printf in the code, there's a chance the standard output isn't sent to the console before the error occurs. Obviously, if you're using a debugger you will know exactly where the segfault occured.

Looking at your code, nR and nC aren't defined so I don't know how big rankMat and srcMat are, but two thoughts crossed my mind while looking at your code:

  • You don't check i and j to ensure that they don't exceed nR and nC
  • If nR and nC are sufficiently large, that may mean you're using a very large amount of memory on the stack (srcMat, rankMat, normMat, and sorts are all huge). I don't know what environemnt you're running in, but some systems my not be able to handle huge stacks (Linux, Windows, etc. should be fine, but I do a lot of embedded work). I normally allocate very large structures in the heap (using malloc).


Generally files 2GB (2**31) or larger are the ones you can expect to get this on. This is because you are starting to run out of space in a 32-bit integer for things like file indices, and one bit is typically taken up for directions in relative offsets.

Supposedly on Linux you can get around this issue by using the following macro defintion:

#define _FILE_OFFSET_BITS 64

Some systems also provide a separate API call for large file opens (eg: fopen64() in MKS).


400Mb should not be considered a "large file" nowadays. I would reserve this for files larger than, say, 2Gb.

Also, just opening a file is very unlikely to give a segfault. WOuld you show us the code that access the file? I suspect some other factor is at play here.

UPDATE

I still can't tell exactly what's happening here. There are strange things that could be legitimate: you discard the first line and also the first token of each line.

You also assign to all the rownames[j] (except the first one) the address of dest which is a variable that has a block scope and whose associated memory is most likely to be reused outside that block. I hope you don't rely on rownames[j] to be any meaningful (but then why you have them?) and you never try to access them.

C99 allows you to mix variable declarations with actual instructions but I would suggest a little bit of cleaning to make the code clearer (also a better indentation would help).

From the symptoms I would look for some memory corruption somewhere. On small files (and hence less tokens) it may go unnoticed, but with larger files (and many more token) it fires a segfault.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜