Topics for Theses on Bioinformatics
You want to write your thesis on a bioinformatics topic?
Then have a look at the proposed topics below. If you like a topic, then first read any suggested literature and use the links below to find at least one other relevant paper. Once you have done this, please send me a short email stating which topic you want to do, the papers you have read, and your background. Then we can meet to discuss supervision in detail.
Topics
Broadly, the bioinformatics group works in two main areas: Structural protein interactions and ontologies for textmining.
Bio: Characterising interaction interfaces at the sequence level
www.SCOPPI.org is a database of structural protein interactions. It is based on structures deposited in PDB. There is still a huge gap between the number structures and the number sequences available. One way to close this gap will be to characterise interfaction interfaces at the sequence level. To this end, the student will analyse the sequence profiles and residue composition of interaction interfaces for selected superfamilies in SCOPPI. The resulting characterisation will be evaluated against existing structures and interactions and used to predict interaction interfaces of sequences with unknown structure.
Requirements: SQL and Python
Bio: Defining an interaction matrix and computing interface alignments from sequences
This topic is related to the one above. SCOPPI contains a list of all interacting residue pairs involved in domain-domain interactions. In a similar way to PAM and BLOSUM, these interacting residue pairs can be used to define an interaction matrix. This interaction matrix can then be used to try to align two interfaces of interacting proteins. The method will be evaluated on known interactions in PDB and used to predict interactions in sequences with unknown structure.
Requirements: SQL and Python (or other programming language)
Literature: Henschel et al. Bioinformatics 2006 and Kim et al. PLoS Computational Biology, 2006, Winter et al. Nucleic Acids Research, 2006
Bio: SCOPPI, evolution and the kingdoms of life
The evolution of interaction interfaces is a hot topic. Are interaction interfaces more conserved than the rest of the non-functional surface? How to define non-functional surface? How do interaction networks compare between different species? Do interaction interface co-evolve? The student will answer these problems in the context of the SCOPPI database.
Requirements: Python, SQL
Literature: Henschel et al. Bioinformatics 2006 and Kim et al. PLoS Computational Biology, 2006, Winter et al. Nucleic Acids Research, 2006
Bio: Mapping gene expression data onto protein interactions
Often gene expression data is clustered without knowing why the gene may have similar profiles. To relate the genes beyond mere clustering additional information should be considered. In MedMiner, the profiles are e.g. enriched by functional annotations from GeneOntology. In this project, the student will complement such as approach and link the expression data to protein interaction data. This approach will be tested on pancreas tumor gene expression data in collaboration with the medical faculty and Resprotect.
Requirements: Python, SQL
Literature: Dawelbait et al. Computational Life sciences, 2005 and Winter et al. Nucleic Acids Research, 2006
Bio: Bioinformatics for Atomic Force Microscopy
With atomic force microscopy proteins can be pulled out of a membrane and the necessary force can be measured (for details see Mueller lab). The force curve give insights into the structure of the membrane protein. Bioinformatics can help to understand unfolding. For multiple experiments force curve have to be aligned and peaks have to be identified. Which interactions within the protein, between helices, between unfolding barriers, between protein and membrane are responsible and can explain observed unfolding pathways?
Requirements: Python, SQL
Literature: Marsico et al. Bioinformatics 2006, Marsico et al. Computational Life Science 2006
CS: Detecting GeneOntology terms in literature abstracts
GeneOntology is a taxonomy of vocabulary in molecular biology. Recently, many tools annotate output using GeneOntology terms, which will make it possible to cross-reference different tools. An important problem concerns the identification of GeneOntology terms in free text. The task will be to write an algorithm, which is fast and performs well. It will be applied to abstracts in literature database PubMed. Initially, a linear time algorithm needs to be developed to find exact matches. But this will miss many terms, which occur in slight variations in the free text. Therefore, the next step will comprise an analysis of GeneOntology (e.g. do terms often end with very general words like "Activity", which can be safely ommitted), which will then lead to a more refined algorithm to pick up GeneOntology-terms.
Literature: Doms et al. Nucleic Acids Research 2005
Requirements: Programming experience in Java and MySQL
CS: Visualising interaction networks
Protein interaction networks are scale-free, which means that there are a few highly connected nodes and many nodes with low connectivity. Currently, most authors use force-directed graph drawing to layout these graphs. The results are poor. The thesis will develop a novel approach exploiting some features of these graphs to obtain a good layout.
Requirements: Programming experience in Java
Literature: Bolser et al. BMC Bioinformatics 2003
CS: Visualise and analyse author networks in SWISSPROT and PubMed
SWISSPROT contains references to literature with some 160.000 authors. Analysing the network of co-authorship can be very useful to define topics and areas and discover trends. Such a author network can then e.g. be analysed in terms of function of the underlying proteins investigated.
Requirements: Programming experience in Java




