I'll describe our recent empirical investigation and modeling of the joint statistical properties of a wavelet representation based on derivative operators. In particular, I'll describe the use of Gaussian Scale Mixtures (produce of a scalar random variable and a Gaussian vector) to model the statistics of clusters of wavelet coefficients at adjacent positions, scales and orientations. These models produce excellent results when applied to the problem of denoising, greatly surpassing the performance of both standard linear (Wiener) and thresholding estimators. I'll also describe recent work examining the statistical relationships between adacent measures of phase and orientation, and introduce an efficient geometric representation of images based on multi-scale gradient directions.