fopen Segfault error on large files
Hello everyone I'm new to C but I've recently been getting a weird segfault error with my fopen.
FILE* thefile = fopen(argv[1],"r");
The problem I've been having is that this code works on other smaller text files, but when I try with a file around 400MB it will give a sefault error. I've even tried hardcoding the filename but that doesn't work either. Could there be a problem in the rest of the code causing the segfault on this line?(doubt it but would like to know if its possible. It's just really odd that no errors come up for a small text file, but a large text file does get errors.
Thanks!
EDIT* didn't want to bog this down with too much but heres my code
int main(int argc, char *argv[])
{
if(argc != 3)
{
printf("[ERROR] Invalid number of arguments. Please pass 2 arguments, input_bound_file (column 1:probe, columne 2,...: samples) and desired_output_file_name");
exit(2);
}
int i,j;
rankAvg= g_hash_table_new(g_direct_hash, g_direct_equal);
rankCnt= g_hash_table_new(g_direct_hash, g_direct_equal);
table = g_hash_table_new_full (g_direct_hash, g_direct_equal, NULL, g_free);
getCounts(argv[1]);
printf("NC=: %i nR =: %i",nC,nR);
double srcMat[nR][nC];
int rankMat[nR][nC];
double normMat[nR][nC];
int sorts[nR][nC];
char line[100];
FILE* thefile = fopen(argv[1],"r");
printf("%s\n", strerror(errno));
FILE* output = fopen(argv[2],"w");
char* rownames[100];
i=0;j = 1;
int processedProbeNumber = 0;
int previousStamp = 0;
fgets(line,sizeof(line),thefile); //read file
while(fgets(line,sizeof(line),thefile) != NULL)
{
cleanSpace(line); //creates only one space between entries
char dest[100];
int len = strlen(line);
for(i = 0; i < len; i++)
{
if(line[i] == ' ') //read in rownames
{
r开发者_开发百科ownames[j] = strncpy(dest, line, i);
dest[i] = '\0';
break;
}
}
char* token = strtok(line, " ");
token = strtok(NULL, " ");
i=1;
while(token!=NULL) //put words into array
{
rankMat[j][i]= abs(atof(token));
srcMat[j][i] = abs(atof(token));
token = strtok(NULL, " ");
i++;
}
// set the first column as a row id
j++;
processedProbeNumber++;
if( (processedProbeNumber-previousStamp) >= 10000)
{
previousStamp = processedProbeNumber;
printf("\tnumber of loaded lines = %i",processedProbeNumber);
}
}
printf("\ttotal number of loaded lines = %i \n",processedProbeNumber);
fclose(thefile);
How do you know that fopen
is seg faulting? If you're simply sprinkling printf
in the code, there's a chance the standard output isn't sent to the console before the error occurs. Obviously, if you're using a debugger you will know exactly where the segfault occured.
Looking at your code, nR
and nC
aren't defined so I don't know how big rankMat
and srcMat
are, but two thoughts crossed my mind while looking at your code:
- You don't check
i
andj
to ensure that they don't exceednR
andnC
- If
nR
andnC
are sufficiently large, that may mean you're using a very large amount of memory on the stack (srcMat
,rankMat
,normMat
, andsorts
are all huge). I don't know what environemnt you're running in, but some systems my not be able to handle huge stacks (Linux, Windows, etc. should be fine, but I do a lot of embedded work). I normally allocate very large structures in the heap (usingmalloc
).
Generally files 2GB (2**31) or larger are the ones you can expect to get this on. This is because you are starting to run out of space in a 32-bit integer for things like file indices, and one bit is typically taken up for directions in relative offsets.
Supposedly on Linux you can get around this issue by using the following macro defintion:
#define _FILE_OFFSET_BITS 64
Some systems also provide a separate API call for large file opens (eg: fopen64()
in MKS).
400Mb should not be considered a "large file" nowadays. I would reserve this for files larger than, say, 2Gb.
Also, just opening a file is very unlikely to give a segfault. WOuld you show us the code that access the file? I suspect some other factor is at play here.
UPDATE
I still can't tell exactly what's happening here. There are strange things that could be legitimate: you discard the first line and also the first token of each line.
You also assign to all the rownames[j]
(except the first one) the address of dest
which is a variable that has a block scope and whose associated memory is most likely to be reused outside that block. I hope you don't rely on rownames[j]
to be any meaningful (but then why you have them?) and you never try to access them.
C99 allows you to mix variable declarations with actual instructions but I would suggest a little bit of cleaning to make the code clearer (also a better indentation would help).
From the symptoms I would look for some memory corruption somewhere. On small files (and hence less tokens) it may go unnoticed, but with larger files (and many more token) it fires a segfault.
精彩评论