technical report Detecting Disease Genes Based on Protein Interaction Networks


Discovering the human genes that cause disease (or "disease genes") is one of the emerging tasks in bioinformatics and biomedicine. In many ongoing research projects, protein-protein interaction networks (PPI) are being exploited in the discovery process, because there is a complex interplay between disease genes and PPI. Most current PPI-based methods only employ data regarding well-known disease genes, using supervised learning. However, there is a lot of valuable data containing information about unknown genes which could potentially enhance disease gene predictions. Combining multiple data sources for both known disease genes and unknown genes is expected to better predict which genes are likely to be disease genes. We have developed a novel method to effectively predict disease genes, by taking advantage of the wealth of existing data which may contain information about unknown genes. To this end, our method makes the best of semi-supervised learning, integrating data of human protein-protein interactions and various biological data extracted from multiple proteomic/genomic data sources. An experimental evaluation demonstrated that our proposed method outperformed other methods in terms of several measures including sensitivity, specificity, precision, accuracy, and a balanced F-score. A considerable number of potential disease genes were discovered and initially validated.

Paper Details


P. Nguyen,  T. Ho