choosing between algorithms

2023-01-07 07:33 问答作者：

I am sure there are lot of Software Testing Engineers, Algorithm Validation Engineers on Stackoverflow.Could someone please tell me how would one proceed in the following scenario.

Say开发者_StackOverflow社区 we have a Mammogram and 5 different algorithms which take this mammogram as input and identify if there is Cancer in the patient. If 3 out of 5 algorithms say there is cancer in the patient and 2 say there is no cancer in the patient. Which algorithm should I believe. How should I proceed testing these algorithms. Is there any statistical concept used in such scenarios?

I was asked this question in an interview for Algorithm Validation Engineer position. I believe they were trying to see how I would think given such scenario. How should I have answered this?

Thanks for your time

-Sashi

You can't say anything having only this information. What if some of the algotithms reuse some other algorithms from those 5? Then they could be prone to the same defects.

Say A, B and C actually use the same sub-algorithm for preprocessing data and the latter gives suboptimal results on some specific image and so the preprocessed image causes the later phases to produce wrong results - it won't matter you have three algorithms saying the same.

You need more specific data on how the algorithms correlate and what statistical characteristics are knwon about there error rates to be able to perform any analysis.

This is actually pretty tough to answer. I'm sure each algorithm is good at picking up different types of input triggers. More than likely, you'll need some statistical analysis to determine what each algorithm will usually detect as cancer. Additionally, you may go so far as to do something like make a Bayesian model to describe/determine if a patient has cancer based off of the algorithmic results.

you may find that 3 algorithms consistently miss the specific type of cancer that the other two are moderately good at picking up. you may discover similar relationships that occur, like when algorithms 2, 3, and 5 say there is no cancer, algorithm 1 says there is, and algorithm 4 is inconclusive, that there are usually typically-benign spots of a certain shape and color intensity that should be analyzed but probably aren't cancer.

Choosing the best classifier for a job, or combining various classifiers is a whole separate field. This general article about classification is as good a start as any to learn about choosing the best classifier for a job. And this article about classifier ensembles is a good place to start learning about classifier combining.

To give the basis for an answer to your (quite broad) question: The best classifier for a job depends on several factors:

The required quality of classification (in your case this would be high)
The allowed complexity of the classification (i.e. can you compute for days to reach an answer of do you have several milliseconds) (time is not a limitation in you case I'd guess)
The cost associated with misclassification. This is a very important factor in your case. If you tell people they have cancer when they have not you cause great stress, but (one hopes) further testing (which costs money) will eventually find out they are healthy. On the other hand, if you miss cancer in a patient she could die. This means that the "best" classifier (the one that makes the least errors) may not be the best for your problem.

On that last point: Say 1 in 1000 women has cancer, I have some classifiers:

Misses 20% of cancer cases, and says a healthy woman has cancer in 2% of cases. This classifier would make about 200 errors in a population of 10000 people.
Just say: "This person does not have cancer" in all cases. Only 10 errors in 10000 cases!
Just say "This persion has cancer" it will make 9990 errors in 10000 cases.

The second classifier makes the lowest number of errors, but after a few months of using it people who could have been saved start dying. The third classifier sends everybody on for the next test (which would have the same problem as this one) or maybe it inflicts a useless life changing operation on 9990 healthy people. The second test makes a trade-off. Two people might become very sick or even die, 198 people experience painfull and stressfull procedures and operations for naught. Obviously in your case all classifiers were like classifier 1 with slight variations in the percentages. In those cases you have to make a trade-off between missing cancer cases, and inflicting the rest of the procedure (including cost!) on healthy people. The starting point for research about this trade-off is the receiver-operater-characteristic

Put your interviewee hat on, it's a psychological assessment. Questions such as this algorithm assessment have more than one correct answer. I learned about these questions from my wife who worked as a recruiter for 5+ years. The interviewer wants to see how you react. Its best to just make assumptions and drive to a logical conclusion. Do not say "I don't know", become argumentative, or ask a ton of questions. You will appear difficult and argumentative (as many programmers are).

Now that you know this is not a programming question, consider asking about it on careeroverflow.com. I like these questions because it shows an ability to adapt and become nonrigid.

Why is a manhole round? <--Microsoft's version

Well, clearly false negatives here are far more serious here than false positives, so all things being equal, we may want to show a preference for algorithms that find more cancer.

If we feed lots more mammograms to the software and we find a collection of the algorithms seem to agree on a large sample of the mammograms, then we may want to prefer those algorithms, since their results are supported by more algorithms.

Something like that.

Everything else being equal, you could say that the patient has a 60% chance of cancer. To give a better answer you need to know more information about how algorithm works. Some points to consider:

Perhaps some algorithms are newer than others, or have been proven to be less reliable. It would be good to know the accuracy of each algorithm, using historical mammogram data tagged as "Cancerous" and "Non-Cancerous".
Each person's cancer is slightly different - maybe there are characteristics that a certain algorithm is better at identifying? Is a domain expert required in order to determine which diagnosis is correct, based on the algorithm conclusions and the mammogram (image?) data?
As sharptooth mentioned, perhaps certain algorithms use the same techniques as other algorithms, so both are likely to have the same bias.

It is not a trivial problem, and highly dependent on what risks you are willing to take.

Formalisms like decision theory and bayesian inference are really to be considered here. It allows you to take into account various probabilities of false positive/negatives, and whether you want to weigh them differently.

I don't think you should have answered in any specific way. The interviewer would probably want to analyze how you would evaluate this problem, not your final answer. In other words, they were probably interested in your own algorithm to take a decision.

In a real life environment, I can't think of any serious choice between 5 algorithms to find cancer, specially when they give so different results.

This is a good opportunity for implementing what is sometimes called an "expert system". You take a large sample of your data (in your case, mammogram images and the output of the various algorithms) and run it past a series of real-life, flesh and blood experts in the field (here, oncologists or lab techs). Record the responses for each image along with the outputs of the algorithms. At the end, you should have enough data to map algorithm output to expert output. To verify that your mapping works, run a bunch of test images through your system (samples that were not part of the original data set) and ask your panel of experts to double-check the results. Ideally, the experts should agree with your system's output a very high percentage of the time.

Without knowing anything about the algorithms itself, it's difficult to make a decision based off of 3 "yes" and 2 "no" results (especially for something as important as a cancer screening). Getting as close as possible to the same results as a trained expert is your goal (at least at first), and systems like this can sometimes be made more accurate by basing decisions on the knowledge and experience of experts in the field rather than on mathematical algorithms alone.

I would have asked if using a computer to determine whether someone has cancer is the correct course of action, given that the use of algorithms is error prone.

But, if for some reason a set of algorithms must be used, then have a human operator (ie, a doctor) examine the mammogram personally in the case where there is some uncertainty. The doctor can then decide if further tests are warranted, based on the disagreement of the algorithms used.

The one thing we overlook as programmers, is that humans can some solve problems we cannot predict; imagine the doctor notices something in the mammogram that algorithms were not designed to detect?

I think that if you had some statistical information about each algorithm previous performances(how many times it was right/wrong on a number of statistical experiments), then you could calculate the probability of being right for each algorithm. Then you could somehow combine those probabilities to get the odds of that person having cancer. Just a speculation...

To accomplish much in such a situation, you generally want to have a "gold" standard -- e.g., an opinion from a physician about whether a set of mammograms shows cancer, or use historical information where you know that one set of mammograms show cancer and another set does not. Along with that, if possible, you'd like information about what indicators each algorithm is using in a particular case.

With a standard, you can start to get an estimate of which algorithm(s) is/are more "accurate" (i.e., agree with the expert opinion the most often). Information about indicators allows you to generate more detail about the times and circumstances under which each appears to be more or less accurate, so you can start to form a judgement about times/circumstances under which to trust one over another. With this, you can (at least hope to) combine the results from your five existing algorithms into a single overall result that's (with care and perhaps a bit of luck) more accurate than any of them individually.

Basically, if you know that the results of the algorithms are conditionally independent (i.e. independent given the true but unknown class label) then using Naive Bayes is an optimal meta-classifier.

Otherwise, this question is unanswerable without knowing the conditional dependency structure among the classifiers. For example, if classifier A, B, C and D are weak, identical classifiers (i.e. they always give the same results) and have accuracies of 0.51, while classifier E is conditionally independent of classifiers A, B, C and D and has an accuracy of 0.99 then I think it's pretty obvious that voting is a bad idea.

Since the algorithm produces a "yes" or "no" answer, this is pretty easy. You need actual test data to compare your algorithms to. You should probably gather long-term data on the success rates of the various heuristics, and do some statistical analysis of which ones are more likely to be right.

Validating things like Google's search algorithm - which has no "right" answer - would be harder.

Based on the information given, you would not be able to answer. You would have to take all 5 algorithms, and test them on patients diagnosed with cancer, and also ones that are known to be cancer free. This would allow you to determine which algorithm was the most accurate.

You also could make an algorithm from the 5 (presuming they were all good and valid algortihms) and taking the side of whatever one had more votes. This could or could not be a valid sixth algorithm depending on how good the first 5 are.

继续阅读：comparison statistics

choosing between algorithms

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？