Bioinformatics

 

Protein function prediction using uncertainty

This BBSRC funded bioinformatics project ran from Nov 2004 to Oct 2007, with Dr Chris Needham working alongside Dr Andy Bulpitt in the School of Computing and Dr James Bradford working alongside Prof David Westhead in the Faculty of Biological Sciences. The project applies machine learning techniques to a range of protein function prediction tasks. One particularly important aspect of dealing with biological data is modelling uncertainty, since the processes for measuring biological systems introduce noise into the data. For this reason Bayesian approaches to learning have been explored -- where distributions of model parameters are considered and marginalised over, rather than just point estimates of model parameters.

Protein-protein interfaces

ppi site A novel system for predicting binding sites on protein surfaces. Identification of binding sites, and which proteins a particular protein binds to gives clues to the protein's function, since proteins need to bind to each other in order to interact. This work developed a novel system for predicting binding sites on protein surfaces. Incorporating a naive Bayes classifier into a prediction scheme to integrate information from diverse physio-chemical properties of interaction interfaces increases performance.

Bradford, James R; Needham, Chris J; Bulpitt, Andrew J; Westhead, David R. Insights into protein-protein interfaces using a Bayesian network prediction method. Journal of Molecular Biology, vol. 362, pp. 365-386. 2006.doi:10.1016/j.jmb.2006.07.028

For predictions, the PPI-pred server is online.

Functional effects of mutations

SNP Heat Map Mutations in DNA, such as SNPs (single nucleotide polymorphisms) or missense mutations (those SNPs which cause an amino acid change) may or may not affect the function of a protein. Genetic mutations happen naturally, and there are many differences between the genetic code of individuals. However identification of those mutations which cause disease or alter the functional effect of the protein are of particular interest. This project has analysed the important factors in predicting functional effects, comparing when only structural or homology based attributes are used. Also, the suitability of different datasets when making deleterious SNP predictions has been investigated. This is set to become a hot topic area, as more human genomes are sequenced and analysed for genetic variations.

Needham, Chris J; Bradford, James R; Bulpitt, Andrew J; Care, Matthew A; Westhead, David R. Predicting the effect of missense mutations on protein function: analysis with Bayesian networks. BMC Bioinformatics, vol. 7. 2006.doi:10.1186/1471-2105-7-405

Care, M A; Needham, C J; Bulpitt, A J; Westhead, D R. Deleterious SNP predictions: be mindful of your training data! Bioinformatics, vol. 23, pp. 664-672. 2007. doi:10.1093/bioinformatics/btl649

Protein function prediction

Function Prediction This work aims to describe the function of an unknown protein (gene product) by predicting the Gene Ontology annotation for the protein. This is a huge multi-class classification problem with potentially more possible classes (descriptions) than data items. A Bayesian network with a structure that encodes a subset of the Gene Ontology description is used and the remaining network structure is learned in order to form a compact model which integrates information from features derived from gene expression profiles, protein-protein interactions and sequence motifs. Data from the plant Arabidopsis thaliana is used.

Bayesian networks

BN image Alongside this research, we have written two primers in top international journals introducing Bayesian networks in a biological sciences context. The first concentrates on introducing the basics of inference, with a cell signalling pathway example. The second contains a substantial review of learning in Bayesian networks, and covers conditional independence of variables, joint probability distributions, parameter learning, structure learning, and Bayesian methods for calculating marginal likelihood by averaging over distributions of parameters, rather than making point estimates. At ISMB 2006 (Intelligent Systems in Molecular Biology Conference), we gave a four hour tutorial on Bayesian networks for bioinformatics.

Needham, C J; Bradford, J R; Bulpitt, A J; Westhead, D R. A Primer on Learning in Bayesian Networks for Computational Biology. PLoS Computational Biology, vol. 3, pp. e129. 2007. doi:10.1371/journal.pcbi.0030129

Needham, C J; Bradford, J R; Bulpitt A J; Westhead D R. Inference in Bayesian networks in: Nature Biotechnology, Volume 24, Number 1, January 2006, Pages 51-53. doi:10.1038/nbt0106-51

Learning gene regulatory networks

Gene reg net Information on learning transcription networks from microarray data can be found here.