Tuesday, April 12th @ 11:00-12:30 PM
Making sense of diagnostic performance when information is limited
Roger Stein, NYU
Abstract: In machine learning, drug trials and other domains that involve binary outcomes, it is common to measure the power of a predictive model by constructing an ROC curve and calculating the area under this curve. However, in some cases, it may be difficult to understand the AUC under imperfect conditions. We present results that provide bounds on the AUC in a number of such settings.
As one example, in evaluating which COVID-19 test is most effective, there is generally only sparse information about each test since it is common for efficacy to be reported only in terms of the true positive and true negative rates of the diagnostic, but it can happen that the reported results for two or more tests are not conformable and thus cannot be compared. It is useful to understand the upper and lower bounds of the full power of each test, so they may be compared. As another example, a commercial firm may have the opportunity to acquire additional data on some, but not all, of its customers, but may be unsure of whether using the data will be economically beneficial, given the (high) cost of acquisition and additional modeling resources. In such a case, a single model that is not for scoring all customers may be inefficient, so it is useful to understand how a strategy of switching between models, depending on the the available data, may improve or degrade overall performance of a predictive model in order to assess the value of the extra data.