开发者

Data validation optimisation

I am currently working on a student management system which track students and semesters at which they take classes. I have a feature that allow the users to import data through an Excel file.

Through the process I validate the data for its validity (invalid chars, email and such) and I check for duplicate against the database to see if the student exists and/or as already taken classes. To achieve the validation I have to run two queries for each students which is not bad if I only have to manage 250 students at once (each query runs at a 0.004 second in average). The problems start building up when the liste get closer to 500 students. The whole validation process takes about 2.5 seconds.

In my book there would be no issues if this was only occuring once, but it happens everytime the data are loaded through sequential navigation because I allow users to review data (30 per page are displayed) before importing and they can edit them.

What would be the best way to get around this bottle neck, or should I let it be and warn the users?

What would be the best way to reach maximum scalability?

EDIT :

In my application I have separated the students and the semesters at which they attend. The first query I run is to check if the students exist. From that if its a new student, it will create a new student otherwise, the student's data is updated.

The second query is used to check if the student is already attending the semester for which I try to add him. If he is already attending, he will be excluded from the importation.

the validation process goes as follow :

Trim white space
Strip html tags

check email validity
check if the semester for which I try to register him is a valid semester (ie : w11 for winter 2011)
check if the province is a valid canadian province

// first query
check for birthday, first and last name in the database 
// second query
check if the student is already registred for the semester i开发者_JS百科n the database

The first three check must be true to be a valid input

If the first query returns true, the student exist, the data must be updated, if it returns false, its a new student and a new student must be created.

If the second query returns true, the student doesn't have to be registered again since he is already attending the semester, if it returns false the student must be registered for the desired semester


The time is probably spent in making queries rather that shuffling data. Perhaps you could start the script by going through all students in the batch and build one query, resulting in the students already defined, for example:

SELECT student_id FROM student WHERE student_id IN (student, ids, from, your, batch, file)

With that data you can do a simple check if the student needs an update or an create. Basically the same question can be used to get those who already have registered.


I'm not entirely clear about the entire process, but you say users can edit the data (after the initial import and validation?) why rerun the entire validation, and not just validate the editted records?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜