The performance of Probability of Default (PD) models and other credit classification methods such as ratings is often measured by assessing how well the method differentiates entities that eventually default within a given time frame from those that do not default. A standard approach is to apply the theory of Receiver Operating Characteristic (ROC) curves taken from signal detection theory. This talk considers the effectiveness of ROC-like performance measures with regard to characteristics of the data sample including size, distribution and correlation. We also discuss related questions concerning model calibration and in-sample and out-of-sample performance.