Automated approaches for the identification and clustering of semantic features or topics are highly desired for text mining applications. Using a low rank non-negative matrix factorization algorithm to retain natural data non-negativity, we eliminate the need to use subtractive basis vector and encoding calculations present in techniques such as principal component analysis for semantic feature abstraction. Existing techniques for non-negative matrix factorization are briefly reviewed and a new approach for non-negative matrix factorization is presented. A demonstration of the use of this approach for topic (or discussion) detection and tracking is presented in the context of a prototypical email surveillance system.
Audio (MP3 File, Podcast Ready)