Bioinformatics
The explosion of data from high-throughput experimental methods that has occurred in molecular biology since the advent of the first complete genome sequences in the mid 1990s has driven the need for automated data analysis, annotation and prediction. This requirement has led to a number of multi-disciplinary projects with the Institute of Molecular and Cellular Biology looking at how machine learning methods, and techniques used in machine vision may be applied to solve biological problems.
|
Surface MatchingBasic active and ligand binding surface matching through image comparison |
|
Protein Structure AnalysisAutomatic identification of key patterns with multiple protein structures |
|
Protein Function PredictionProtein function prediction and classification using uncertainty |
Basic active and ligand binding surface matching through image comparison
Steven Pickering, Andy Bulpitt &David Westhead
This project investigated the use of surface properties of proteins to predict function. The algorithm uses local measures of surface geometry and electro- chemical characteristics at the active sites of proteins to identify similar structures in PDB databases and hence proteins with potentially similar function. The images below shows an example of the surface features selected for matching using surface shape index and curvedness and the features matched between two proteins.
|
||
| Surface features selected using shape index and curvedness | ||
| ||
| Features matched on 1hdx_1 and 1hdx_3 (Alcohol Dehydrogenase) |
Funded by BBSRC
Automatic identification of key patterns with multiple protein structures
Craig Lucas & Andy Bulpitt
A common approach to predicting protein function from structure is to perform a rigid pair-wise structural comparison to a database of structures of known function. Many of these algorithms do not take into account the inherent flexibility within protein and so find it difficult to match those with large regions in common but with differences in overall structure.
This project aims to develop a novel algorithm designed to find common sub-structure features between multiple proteins with similar function but different overall fold. The algorithm is intended to run with multiple input proteins and will find multiple matching sub-structures if there are several present. Such sub-structures would be useful in the annotation of new protein structures with unknown function, especially those without known homologues. The algorithm is independent of sequence and may take whole structures as input with no limitation on the overall size of matching patterns.
The algorithm is a graph-based, breadth-first search. A match occurs if every inter-node distance of a pattern is within a given tolerance of a corresponding distance in the matched pattern. Any matches across the entire set of input structures are repeatedly expanded and searched until no larger common patterns remain. The more structures that are input, the lower the probability that small structures will match across the entire set by chance. This means it is likely that a final result will be obtained faster than by comparing proteins of a set two at a time. The search complexity is reduced by a coherence value which forces each point within a pattern match to lie within the coherence distance from at least one other point within the same pattern. This allows a match to be of any size, but not overly segmented.
|
|
|
| The example here is a comparison between three proteins - an NAD-binding alcohol dehydrogenase (PDB code 1HDX), an FAD-binding trypanothione reductase (PDB code 1AOG) and an ATP-binding moeb-moad protein complex (PDB code 1JWA). These proteins bind different ligands, but each ligand contains adenine, providing an example of multiple structures which are likely to contain similarities but which differ in overall backbone structure. The matching regions (blue spheres) are all at ligand-binding sites (red spheres). | ||
Funded by EPSRC/BBSRC Bioinformatics Studentship

![[*]](images/pro_features.jpg)
![[*]](images/1aog_overlay400x300.jpg)
![[*]](images/interactioninterface1.jpg)