Data validation optimisation

2023-02-22 20:31 问答作者：

I am currently working on a student management system which track students and semesters at which they take classes. I have a feature that allow the users to import data through an Excel file.

Through the process I validate the data for its validity (invalid chars, email and such) and I check for duplicate against the database to see if the student exists and/or as already taken classes. To achieve the validation I have to run two queries for each students which is not bad if I only have to manage 250 students at once (each query runs at a 0.004 second in average). The problems start building up when the liste get closer to 500 students. The whole validation process takes about 2.5 seconds.

In my book there would be no issues if this was only occuring once, but it happens everytime the data are loaded through sequential navigation because I allow users to review data (30 per page are displayed) before importing and they can edit them.

What would be the best way to get around this bottle neck, or should I let it be and warn the users?

What would be the best way to reach maximum scalability?

EDIT :

In my application I have separated the students and the semesters at which they attend. The first query I run is to check if the students exist. From that if its a new student, it will create a new student otherwise, the student's data is updated.

The second query is used to check if the student is already attending the semester for which I try to add him. If he is already attending, he will be excluded from the importation.

the validation process goes as follow :

Trim white space
Strip html tags

check email validity
check if the semester for which I try to register him is a valid semester (ie : w11 for winter 2011)
check if the province is a valid canadian province

// first query
check for birthday, first and last name in the database 
// second query
check if the student is already registred for the semester i开发者_JS百科n the database

The first three check must be true to be a valid input

If the first query returns true, the student exist, the data must be updated, if it returns false, its a new student and a new student must be created.

If the second query returns true, the student doesn't have to be registered again since he is already attending the semester, if it returns false the student must be registered for the desired semester

The time is probably spent in making queries rather that shuffling data. Perhaps you could start the script by going through all students in the batch and build one query, resulting in the students already defined, for example:

SELECT student_id FROM student WHERE student_id IN (student, ids, from, your, batch, file)

With that data you can do a simple check if the student needs an update or an create. Basically the same question can be used to get those who already have registered.

I'm not entirely clear about the entire process, but you say users can edit the data (after the initial import and validation?) why rerun the entire validation, and not just validate the editted records?

继续阅读：php validation

Data validation optimisation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？