Data Mining has enormous potential as a processing tool for scientific data. It provides a solution for extracting information from the massive amounts of data typical in this discipline. However, designing a data mining system for scientific applications is complex and challenging. The two key issues that need to be addressed in the design are (1) variability of data sets and (2) operations for extracting information. Data sets not only come in different formats, types and structures; they are also typically in different states of processing such as raw data, calibrated data, validated data, derived data or interpreted data. The mining system must be designed to be flexible to handle these variations in data sets. The operations required for the mining system vary for different research areas in science domains. Operations could range from general-purpose operations such as image processing techniques or statistical analysis to highly specialized data set-specific science algorithms. The mining system should be extensible in its ability to process new datasets and add new operations without too much effort. The ADaM (Algorithm Development and Mining) system, developed at the Information Technology and Systems Center at the University of Alabama in Huntsville, is one such mining system designed with these capabilities. The system provides knowledge discovery, content-based searching and data mining capabilities for data values, as well as for metadata. It contains over 120 different operations which can be performed on the input data stream. This presentation will provide an overview of the ADaM system architecture and its functionality.
Bio:
Rahul Ramachandran is a Research Scientist at the Information Technology and Systems Center. His academic background is in Atmospheric Science and Computer Science. His research interests include application of information technology for scientific research, data mining, automatic pattern recognition and image classification problems.