An Integrated Approach for Protein Function Prediction

Fengzhu Sun
University of Southern California
Computational Biology

Assigning functions to novel proteins is one of the most important problems in the post-genomic era. Several approaches have been applied to this problem, including the analysis of gene expression patterns, phylogenetic profiles, protein fusions, and protein-protein interactions. We developed a novel approach that employs the theory of Markov random fields to infer a protein's functions using an integrated approach combining various sources of biological data including protein-protein physical interactions, genetic interactions, gene expressions, and domain structure. For each function of interest and protein, we predict the probability that the protein has such a function using Bayesian approaches. Unlike other available approaches for protein annotation in which a protein has or does not have a function of interest, we give a probability for having the function. This probability indicates how confident we are about the prediction. The model is flexible in that other protei n pairwise relationship information and features of individual proteins can be easily incorporated. Two features distinguish the integrated approach from other available methods for protein function prediction. One is that the integrated approach uses all available sources of information with different weights for different sources of data. It is a global approach that takes the whole network into consideration. The second feature is that the posterior probability that a protein has the function of interest is assigned. The posterior probability indicates how confident we are about assigning the function to the protein. We apply our integrated approach to predict functions of yeast proteins based upon MIPS protein function classifications and upon the interaction networks based on MIPS physical and genetic interactions, gene expression profiles, Tandem Affinity Purification (TAP) protein complex data, and protein domain information. We study the sensitivity and specificity o f the integrated approach using different sources of information by the leave-one-out approach. In contrast to using MIPS physical interactions only, the integrated approach combining all of the information increases the sensitivity from 57% to 87% when the specificity is set at 57%-an increase of 30%. It should also be noted that enlarging the interaction network greatly increases the number of proteins whose functions can be predicted.


Back to Workshop I: High Throughput Technologies and Methods of Analysis