How should I process progressive status codes in a database?
I'm working on a project for an academic institution and I need advice on the best way to approach this problem. It's been a long time since I did any traditional application development (close to five years).
The college administration recently revised the college's academic standards policy. Previously, the administration only had three status codes, so this wasn't as big of an issue. However, the new policy has six status codes:
- Good Standing
- Academic Concern
- Academic Intervention (1)
- One-Term Dismissal
- Academic Intervention (2)
- Four-Term Dismissal
From here on, I'll differentiate between GPA for the term by saying termGPA and cumulative GPA by saying cumGPA. If a student's termGPA falls below 2.0, and that causes his/her cumGPA to also fall below 2.0, he/she gets placed on Academic Concern. Once on Academic Concern, one of three things can happen to students in following terms. They:
- Return to good standing if their termGPA and cumGPA rise above 2.0.
- Stay in the current status if their termGPA is above 2.0, but their cumGPA stays below 2.0.
- Move to the next status if both their termGPA and cumGPA are below 2.0.
Normally, I would approach this process by writing a console application that processed each student iteratively and building the status codes as I go. However, we're handling at least 8000 students, and in most cases around 12,500 students per term.
Additionally, this policy has to be applied retroactively over an as-yet-unspecified period of time (since former students could return to the college and would then be subject to the new policy's restrictions), and once I include a student in the data set, I have to go back through that student's entire history with the college. I'm conservatively guessing that I'll go through at least a million student records and calculating each student's termGPA and rolling cumGPA.
Questions:
- Is there any way handle this problem in SQL and avoid using a cursor?
- (Assuming the answer to 1. is "No") How should I structure a console application? Should I create a large collection and process a few thousand students at a time before writing to the database, or update the database af开发者_如何学运维ter each I process each student?
- Am I making way too big of a deal about this?
Thanks in advance for any insight and advice.
Edit: Based on comments to answers here, I should've provided more information about the data structures and the way I'm calculating the GPAs.
I can't use the pre-calculated cumGPA values in our database -- I need the student's cumGPA at the end of each progressive term, like so (note: I made up the GPA values below):
ID TermID CumGpa TermGPA TermNumber PolicyCode
123545 09-10-2 2.08 2.08 1 GoodStanding
123545 09-10-3 1.94 0.00 2 AcademicConcern
123545 09-10-4 1.75 1.00 3 AcademicIntervention
123545 10-11-2 1.88 2.07 4 AcademicIntervention
123545 10-11-4 2.15 2.40 5 GoodStanding
123545 11-12-1 2.30 2.86 6 GoodStanding
The problem is that each subsequent term's status code could depend on the previous term's status code -- Good Standing is actually the only one that doesn't.
As far as I know, that means that I would have to use a cursor in SQL to get each student's most current status code, which is not something I'm interested in, as I work for a cash-strapped college that has precisely three database servers: one for testing, and two servers with the same data on them (we're in the process of moving to SQL Server 2008 R2).
That is interesting. I don't think you'll have to worry too much about the SQL performance. It will run fairly quickly for your application. I just ran a stupid little console app to fix a mess up and inserted 15000 records one at a time. It took about 5 seconds.
First of all, 12 000 records are nothing for nowadays databases so that's not your consern. You should rather focus on keeping it simple. It seems like your database will be offten based on events so I would recomend using triggers ie: fisrt trriger when your termGPA is inserted - update cumGPA, second one after cumGPA update - check your criteria and update status if they occured.
Even the free vesion of SQL now handles databases up to 10 GB. 12,500 records is small. Going though 1 million records you should itterate though each student or groups to allow the tranaction log to clear. That could be done in using a cursor or a console application. If you can perform the calcution in TSQL then batch them would probably be faster than one at a time. The down side is the bigger the batch the bigger tranaction log so there is a sweet spot. If the calculation is too complex for TSQL and takes almost as long (or longer) than insert statement you could insert on a separate thread (or calculate on a separate thread) so the insert and caluculation are in parrallel. I do this an applicaiton where I parse the words out of text - the parse takes about the amount of time as to insert the words. But I don't let it spin up multiple theads. On the SQL side it still had to maintain the indexes and hitting it with inserts from two threads slowed it down. Just two threads and the faster thread waits on the slower. The order you do your updates also matters. If you process in the order of the clustered index then you have a better chance that record is already in memory.
I ended up writing a console application in C# to process these status codes. My users changed the initial status update requirements to only include the previous two terms, but the process had enough edge cases that I opted to take my time and write cleaner, object-oriented code that will be easier to pick back up (he says, hopefully) once this policy matures and changes.
Also, I ended up having to deploy this database onto a SQL 2005 instance, so table-valued parameters were not available to me. If it had been, I would've opted to commit to the database only after processing each student, rather than after processing each term for each student.
精彩评论