Data structures in C [closed]
I have to implement a program in C, and I need a data structure to efficiently manage some data. I was wondering what would be the best way to do this. Any suggestions or pointers are appreciated. Thanks!
A simplified example of the kind of data to be stored is as follows. Suppose that for every subject taught at the school, we need to keep track of the number of students whose grades fall within a range. Lets assume that the range size is user defined and is 10, so the ranges will be 0-9, 10-19, 20-29, and so on. The start of ranges will be 1, 10, 20 and so on. Thus, the data looks like:
Subject Id, start of range of student grades, #students who got grades within this range.
1 30 1
80 5
90 6
2 50 3
60 6
3 40 1
70 5
Notice that all the ranges are distinct i.e. no two subjects can have the same start range.
No开发者_如何学Cte: This isnt homework as someone pointed out, I really need this to keep track of filenames , their access counts by different users (# accesses), and within certain timeslots (ranges).
How about a matrix (2D array) sized as Number of Subjects by Number of Ranges. Each element will hold the number of students for a particular Subject in a particular Range.
You need to do the mapping of Range start to Column in the matrix separately.
Update:
Since you say you have 100M entries that adds up to a lot of memory. Consider adding an extra layer of indirection and dividing the large matrix into many smaller ones. (You'll need a mapping from an Subject to an integer row index and from a Range to a column index. Subjects 1 through N and Ranges 1 through M go to Matrix1, N+1 to 2N and M+1 to 2M go to Matrix2 and so on.)
You'll need to do a lot of tests and determine the N and M that provide the best performance. (Make a program that runs a large number of "common" operations and times them. Graph the execution time, divided per operation type, as a function of N and M.) Since they'll depend on processor caches and other parameters it's unlikely the values will be the same on different systems so they should be configurable on the end program.
Edit:
My solution would create a matrix like this one below(based on the data in your post). You might need to copy-paste it to view at full width.
Subject\Range
0 1 2 3 4 5 6 7 8 9
1 0 0 0 1 0 0 0 0 0 6
2 0 0 0 0 0 3 6 0 0 0
3 0 0 0 0 1 0 0 5 0 0
There are a lot of 0's in it so it's not efficient memory-wise. It should have decent performance (you might want to check the option of dividing the matrix into smaller ones and testing the performance of those implementations.)
To find the used ranges for a given Subject you need to examine the row corresponding to the Subject, any cell > 0 means it's column matches an used range. To find the subjects for a given range, examine the column for the appropriate range.
If the number of subjects isn't too large, then this should work:
#define NSUBJECTS ...
#define NRANGES 10
struct data {
int subj_id[NRANGES];
int nstudents[NSUBJECTS][NRANGES];
}
Initialize the values with, say, -1
to indicate non-availability.
精彩评论