Abstract

Mining in the Presence of Class Imbalance: Precision-Recall Curves and the F-Measure

Jacqueline Hughes-Oliver
North Carolina State University
Statistics Department

Algorithms for anomaly detection and information retrieval are designed to identify and characterize “unusual” subjects. As a result, they are typically applied in situations where class membership is not balanced and may even be highly imbalanced. Assessment of the effectiveness of such algorithms has increasingly abandoned the idea of overall accuracy or error rates due to their inability to distinguish between different types of errors. Even the popular receiver operating characteristic (ROC) curve is being pushed aside because of its property of being independent of class imbalance. In an attempt to assess an algorithm both with respect to its accuracy (as measured by the sensitivity, also known as true positive rate, also known as recall) and its utility (as measured by the positive predictive value, also known as precision), the precision-recall (PR) curve is gaining popularity. In this work, we investigate properties of the PR curve and some related summary measures. Discussion is aided by application to real and simulated datasets.

Back to Blackwell-Tapia Conference and Awards Ceremony

Mining in the Presence of Class Imbalance: Precision-Recall Curves and the F-Measure

Jacqueline Hughes-Oliver
North Carolina State University
Statistics Department

Programs

News & Research

People

About IPAM

Mining in the Presence of Class Imbalance: Precision-Recall Curves and the F-Measure

Jacqueline Hughes-OliverNorth Carolina State UniversityStatistics Department

Programs

News & Research

People

About IPAM

Jacqueline Hughes-Oliver
North Carolina State University
Statistics Department