Dr. Thomas Rattei
from the SIMAP project
Dept. of Genome Oriented Bioinformatics (TU Munich)
April 13, 2006 - Interview number 1
BOINC Based Project

Discuss this!

Project Portal: Briefly tell us about your project.
Thomas Rattei: SIMAP is a database of protein similarities. It contains about all currently published protein sequences and is continuously updated.
Protein similarities are computed using the FASTA algorithm which provides optimal speed and sensitivity. SIMAP is to our knowledge the only project that combines comprehensive coverage with respect to all known proteins and incremental update capabilities.

PP: What interests you the most about your research? Why do you get up and go to work in the morning?
TR: Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function. A well-known example are the investigations about mouse genes and proteins. Their results are also beeing true for orthologous human genes and proteins in many cases. Protein similarities provide information about relations between proteins and are necessary for the prediction of orthologs. There are many more bioinformatics methods that rely on protein similarity. Our protein similarity database provides pre-computed similarity data and represents the known protein space. This opens completely new perspectives compared to the commonly used method to repeatedly re-calculate such kind of data. SIMAP is regularly updated. The similarity matrix is simply beeing incrementally extended if new sequences occur. The use of SIMAP is completely free for education and public research.

PP: In the next 5 years, what direction do you see your research taking?
TR: Bioinformatics will become an integral and essential part of biology, so the areas in the life sciences that do _NOT_ need bioinformatics will get reduced more and more.

PP: We have a number of young people and educators who will read this, how did you get started in your field? What did it take as a young person to get to where you are?
TR: I did my diploma and PhD in chemistry (biochemistry and computational
chemistry) but moved into bioinformatics when the first complete genome sequences became available. Bioinformatics is much closer to the real problems and discoveries compared to cheminformatics. As there were almost no bioinformatics courses available several years ago I've learned much from collegues and scientific collaborators.

PP: What role do you play in the project?
TR: I've initiated the project and am responsible for it. The work on this project is beeing shared between several students and me.

PP: Some people like to choose a project that's purely academic, and some want a project where the knowledge will be applied in a more tangible way. How do you see your research fitting into those choices?
TR: SIMAP fits best to applied bioinformatics, so the data out of it is beeing used in many other projects that analyze genes, functional modules in cells or whole genomes. The outcome of those applications is sometimes academic, sometimes important to understand deseases or for the development of new drugs.

PP: Why did distributed (or grid) computing get brought up in the course of your research? How did you become aware of the technology?
TR: The computational costs to calculate the similarity data depend on the square of the number of contained sequences. So the computational effort for keeping the matrix up-to-date is constantly increasing. Our internal resources that perform calculations for SIMAP since years are no longer sufficient to keep track of all new sequences. That's why we implemented a SIMAP-client for the BOINC platform (Berkeley Open Infrastructure for Network Computing) which is based on the FASTA algorithm to detect sequence similarities. We became aware of the BOINC technology just because of the very famous SETI@home project.

PP: What problems does using today's distributed computing model bring to your research?
TR: There is no real problem arising from today's distributed computing. We had to reserve an amount of our free time for communicating with the community that participates in our project.

PP: How do you feel about the BOINC distributed computing platform? How does it impact your project?
TR: The BOINC project is now absolutely necessary for SIMAP to keep our data up-to-date with respect to the public sequence databases.

PP: There are a lot of worthy projects that distributed computing has brought to us. Why should we choose yours?
TR: SIMAP supports many different projects in life sciences. Choosing SIMAP accelerates not only our project but many research projects all over the world.

PP: A number of the projects need donations to stay running, are you one of them? If so, how can people help you out?
TR: No, our project is supported by our university. So we do not need any additional donations (except on CPU time on your computers).

PP: So many people think of the projects as giant teams of people waiting around to fix bugs and ensure the website are running. How many people work on the project?
TR: That's the list of the 8 persons in our team:

  • Coordinator: Dr. Thomas Rattei, Department of Genome Oriented Bioinformatics (TU Munich), contact: t.rattei (at) wzw.tum.de
  • Client software development: Mathias Walter (student in bioinformatics)
  • Client software development: Michael Boehme (student in bioinformatics)
  • Forum administration and moderation: Alexander Kräutler "alex@simap" (student in biology)
  • Forum administration and moderation: Roland Arnold "supervisor" (Ph.D. student in bioinformatics)
  • Forum administration and moderation: Jonathan Hoser "Jonathan" (student in bioinformatics)
  • Forum moderation: Corinna Klausing "Cori" (Team BOINC@Heidelberg)
  • Forum moderation: Jörg Glasing "Joogie" (Team BOINC@Heidelberg)

PP: Now it's time for some human interest "fluff." What's your favorite:
1. Food:

TR: Italian pasta.

PP: 2. Movie:
TR: Arthur C. Clarke: 2001: a space odyssey

PP: 3. Book:
TR: Lord of the rings.

PP: 4. Television Show:
TR: (I don't watch them)

PP: 5. Hobby:
TR: Astronomy, photography

PP: Is there anything that you would like the audience to know?
TR: I'd like to say thank you to all that make out project running and being successful.

PP: Team Picard would like to extend our most sincere thanks to Dr. Thomas Rattei for being our first interview, hopefully the first of many.

Discuss this!

 

"the computational effort for keeping the matrix up-to-date is constantly increasing"
"Bioinformatics will become an integral and essential part of biology"
"I'd like to say thank you to all that make out project running and being successful."