Research

Introduction

Investigations in molecular biology have transitioned from single experiments to high-throughput endeavours spearheaded by genomic science. Although the genomic revolution is rooted in medicine and biotechnology, environmental studies, most notably those of marine ecosystems, currently deliver the highest quantity of data. New sequencing technologies are providing an increasingly powerful resource to investigate microbial diversity and function at the gene level.

The Microbial Genomics and Bioinformatics Research Group aims to transform the wealth of ‘Omics and contextual (meta)data, currently routinely produced in marine sciences, into biological knowledge. To accomplish this, we focus on the development of enabling technologies at the intersection of computer science, bioinformatics, and field research. In particular, we are developing components and standards for the dynamic integration of knowledge, by bringing together organism diversity and abundance data, functional data (‘Omics data), and the environmental data contextualising them. This allows the persistent and dynamic study of relationships between organisms, their genomic repertoire, and their environment.

Our integrative approach to Ecosystems Biology

This integrated view serves as the foundation for ecosystem-level statistical analysis and modelling of microbial communities and metabolisms. It reveals key genes involved in central ecosystem processes and deepens our understanding of their functions in the environment. Investigating the genetic repertoire in the context of the ecosystem provides new insights into the role of genes with no known function. Our results promote a better understanding of the environmental microbiomes and its impact on human welfare in times of global climate change, and reveals new targets for medical and biotechnological applications.

Taxonomic Research: The backbone of Biodiversity

From the beginning, our taxonomic research has focused on environmentally relevant organisms with a special interest in uncultivated clades. We were the first who consequently named clades of uncultivated sequences in the ribosomal RNA (rDNA) gene tree according to commonly used ‘clone’ names (e.g. SAR 11) to facilitate communication and data exchange among researchers. Since the SILVA datasets have become a quasi-standard in many labs, this approach was adopted worldwide. The rational and implementation of the taxonomic framework have been documented in (Yilmaz et al. 2014).

A step towards a more rational, objective, and unified delineation of taxonomic clades has been recently introduced by the Candidate Taxonomic Units (CTU) concept for cultured and uncultured bacteria (Yarza et al. 2014). The first practical implementation of this concept was subsequently done for marine environmental clades (Yilmaz et al. 2016).

Since comprehensive insights into marine environmental biodiversity must include the diversity of unicellular Eukaryotes we significantly increased our efforts to improve the Eukaryotic part of the SILVA tree. The Goodon and Betty Moore Foundation funded UniEuk project brought together a worldwide team of experts to consolidate the eukaryotic taxonomy.

UniEuk is led by Colomban de Vargas in Roscoff, with Dr. Pelin Yilmaz as the co-director. It maintains a close collaboration with the Oceanomics/Tara Oceans project, as well as the European Nucleotide Archive (EMBL-EBI/ENA, UK). The resulting improved taxonomy will have a broad field of application in environmental research and is supposed to replace the respective taxonomy provided by the International Nucleotide Sequence Database Collaboration (INSDC).

Finally, we are part of the Bergey’s Trust who is the authoritative instance for microbial taxonomy. Furthermore, my team is well connected to the List of Prokaryotic Names with Standing in Nomenclature (LPSN), as well as the INSDC taxonomy groups at EMBL/EBI-ENA and GenBank. These close collaborations guarantee that changes on all sides are communicated and propagated via their worldwide used databases and services offered. We also expect an enhanced level of integration of our developments on the European level, now that Germany is full member of ELIXIR.

The SILVA databases and the SILVAngs classification tool for high throughput rDNA data

The SILVA databases are dedicated to provide the best possible rDNA reference database for environmental research, with a unique focus on uncultivated clades. SILVA is the only database that integrates Bacteria, Archaea and Eukaryota. All taxonomic developments pursued, are first implemented in the SILVA databases to expose them to the wider scientific community for review and instant evaluation.

SILVA

SILVA is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services founded in 2005. With the discontinuation of the European Ribosomal RNA Database Project the SILVA databases have become the authoritative rDNA database project for Europe. The application spectrum of the SILVA databases ranges from environmental sciences, microbiology, agriculture, biochemistry, biotechnology to medicine in academia and industry.

As a spin-off from the SILVA project 'The All-Species Living Tree' project provides a highly manual curated rDNA subset especially designed to serve the microbial taxonomist community. The aim of the project is to reconstruct a single 16S rRNA tree harbouring all sequenced type strains of the hitherto classified species of Archaea and Bacteria.

SILVAngs

To facilitate classification tasks for high-throughput rDNA data the SILVAngs service has been implemented and released in 2013 mainly funded by the German Research Foundation (DFG). SILVAngs is a data analysis service for rDNA reads from high-throughput sequencing (NGS) approaches based on an automatic software pipeline. It uses the SILVA rDNA databases, taxonomies, and alignments as a reference. It facilitates the classification of rDNA reads and provides a wealth of results (tables, graphs and sequence files) for download.

Furthermore, we offer international workshops and training courses on the theory and practical applications of phylogenetic tree reconstruction. To date more than 600 researchers have participated in these workshops held in Europe and the US. If you want to know more about it, please refer to our spin-off company Ribocon.

Marine Ecosystems Research

Environmental genomics at MPI-Bremen has been introduced in the year 2000 by selecting three environmentally relevant marine bacteria for whole genome sequencing, annotation and functional analysis. At this time the organisms of interest were two sulfate reducing bacteria (Desulfotalea psychrophila and Desulfobacterium autotrophicum) and one Planctomycete (Rhodopirellula baltica SH1 T, formerly Pirellula sp. strain 1). The REGX (Real Environmental GenomiX) project, got a six year funding by the Federal Ministry of Education and Research (BMBF).
Within the project we focussed on technology development and competence building by setting up a bioinformatic analysis pipeline for whole genomes.
In order to investigate genes vital for environmental adaptation of environmentally relevant marine microorganisms, for many years Rhodopirellula baltica SH 1T was under continuous investigation. Based on its high quality genome sequence, whole genome microarrays as well as mRNA (cDNA) tag sequencing were set up to perform adaptation experiments. From 2012 on all laboratory work related to Planctomycetes is continued by Christian Jogler.

Metagenomics
In 2008 we started the MIMAS (Microbial Interactions in MArine Systems) project, funded by the BMBF, to investigate the diversity and function of the microbial community in the North Sea at Helgoland Roads. Besides metagenomics, metatranscriptomics using next generation sequencing technology has been conducted.
From 2009 to 2013 we were partners of the MAMBA (Marine Metagenomics for new Biotechnological Applications) project funded by the EU. We successfully provided the bioinformatic backbone for sequence data analysis and screening for new enzymes for biotechnological applications. As a follow up project we are now partners of the H2020 INMARE (INdustrial applications of MARine Enzymes: innovative screening and expression platforms to discover and use the functional protein diversity from the sea) project which started in April 2015. Again we will develop bioinformatic technologies for the screeing of enzymes for biotechnological applications.

Ocean Sampling Day (OSD) and MyOSD: a global snapshot of marine microbial diversity and function.

OSD 2014 was the world’s first simultaneous global sequencing campaign to analyse marine microbial community compositions and embedded functional traits on a single day, the solstice on June 21st (Kopf et al. 2015). The scientific rationale to start a time series combined with the enthusiasm of the field researchers encouraged us to continue with OSD in 2015 and 2016. So far, nearly 200 research teams around the world conducted sampling in a collaborative effort to generate the largest coastal marine microbial dataset to date, related in time, space and environmental parameters. Standards (MIxS(Yilmaz et al. 2011) M2B3 (ten Hoopen et al. 2015)) and standardized procedures, including a centralized hub for laboratory work and data processing, assure a high level of consistency and data interoperability. The result is a unique collection of open access standardized marine sequence (marker genes and metagenomes) and environmental data, following the Fort Lauderdale rules for data sharing. In compliance with the Convention on Biological Diversity and the Nagoya Protocol on Access and Benefit Sharing (ABS), a model agreement and data policy was developed and applied. The partnership with the Smithsonian’s Global Genome Initiative allows long-term bio-archiving of all OSD and MyOSD samples. First results of the comparison of OSD metagenomes with the TARA Oceans data show that a wealth of new genomic information is captured by OSD. The Ocean Microbe Reference Gene Catalogue (OM-RGC) which comprises almost exclusively open ocean data covers only about 1/3 of the OSD ORFs. This is a clear indication that the coastal marine ecosystem has its own genomic signature that is distinct from the open ocean. The nature and reasons for this finding is currently investigated in detail. An obvious hypothesis is that the difference due to the terrestrial influence and especially due to the anthropogenic impact.

Delineating anthropogenic influence in high resolution: MyOSD 2016 in Germany

OSD was already accompanied by the citizen science campaign MyOSD in 2014, in order to raise awareness and strongly engage the public by collecting environmental data and samples (Schnetzer et al. 2016). Involving the public has proven to be very rewarding and the number of samples provided by citizen scientists participating in MyOSD 2015 was already equal to the number of OSD samples. Funded by the BMBF as part of the 'Wissenschaftsjahr Meere und Ozeane 2016*17', we were able to mobilize 1000 citizens in Germany to sample the North and Baltic Sea and all rivers running off to the North and Baltic Sea on 21st of June 2016. The scientific rational was to get a dense network of data to better understand the influence of the anthropogenic load carried by the rivers to the coastal marine ecosystems. We expect that the deep integration with the OSD time series will reveal the changes in microbial community composition from freshwater to marine sites and provide the sources and key parameters influencing microbial life at coastal sites.

All Together Now - Micro B3

From 2012 to 2015 our large-scale integrated 7FP EU project Micro B3 (Marine Microbial Biodiversity, Bioinformatics, Biotechnology) developed innovative bioinformatic approaches and a legal framework to make large-scale data on marine viral, bacterial, archaeal and protists genomes and metagenomes accessible for marine ecosystems biology and to define new targets for biotechnological applications.

Micro B3 was based upon a highly interdisciplinary consortium of 32 academic and industrial partners comprising world-leading experts in bioinformatics, computer science, biology, ecology, oceanography, bioprospecting and biotechnology, as well as legal aspects. Micro B3 is based on a strong user- and data basis from ongoing European sampling campaigns to long-term ecological research sites.


Micro B3 has left a strong footprint on Europe’s capacity for bioinformatics and marine microbial data integration, to the benefit of a variety of disciplines in bioscience, technology, computing, standardisation and law.

Standards: The Genomic Standards Consortium

The Genomic Standards Consortium was established in 2005 with the group being a founding member. The aim of the GSC is making genomic data discoverable. In order to achieve this, the GSC enables genomic data integration, discovery and comparison through international community-driven standards.

GSC has created the 'Minimum Information about any (x) Sequence' (MIxS) standard that includes three minimum information checklists for describing genomes, metagenomes and environmental marker sequences (MIGS/MIMS/MIMARKS). MIxS requires core information on habitat, geolocation, and sequencing methodology, as well as fields specific to the data type and a range of optional environmental packages to capture core measurements defining a broad range of habitats, including water, soil and humans (host-associated). INSDC has created a GSC ‘keyword’ (MIxS) to mark the richer entries complying with this standard.

We are focusing on the development of standardised specification checklists (MIxS) and harmonisation of taxonomy. We also host the central repository of MIxS terms and descriptions.

Some recent examples is the MIBIG (Minimum Information about a BIosynthetic Gene cluster) standard (Medema et al. 2015), mainly targeting biotechnology, plant specimen contextual data consensus (ten Hoopen et al. 2016), the Global Genome Biodiversity Network (GGBN) data standard (Droege et al. 2016), the MIxS hydrocarbon resources extension (Tsesmetzis et al. 2016) and marine biodiversity data reporting standards (ten Hoopen et al. 2015).

Research Data Management: The German Federation for Biological Data (GFBio)

Research data and their proper management from acquisition to archiving and publication are internationally recognized as a key factor for effective research and good scientific practice. Nevertheless, research data are commonly regarded as an add-on to the classical paper publication and not recognised as an asset of its own. If published at all, in most cases only parts of the original data are deposited, often as unstructured supplementary files. This severely limits their usage for current and future generations of researchers. This lack of reusability of data generated in publicly funded research projects hampers reproducibility of research and leads to unnecessary costs when data have to be re-generated.

Spearheaded by the OECD in 2007 a range of key documents have been produced by expert groups that demonstrate the intellectual and economic value of sharing research data. This is especially important for life sciences where the ‘Omics revolution in biology has paved the road to enter the world of 'Big Data'. According to 'A surfboard for riding the wave' (Knowledge Exchange 2011) research data management (RDM) must include quality management, publication and proper archiving of research data. The typical shortcut from collecting data to analysis and paper publication, is no longer acceptable. Research data must be made FAIR (Findable, Accessible, Interoperable and Reusable) to fulfil the principles of Good Scientific Practice.

To support this, funders in Europe and the USA have taken action and request now proper Data Management Plans (DMP) as part of the research proposals. In 2013 the DFG has taken responsibility to set up an infrastructure that helps the researchers with data management by funding the German Federation for Biological Data (GFBio). GFBio’s mission is to support scientists in all aspects of research data management covering the complete life cycle of data from DMPs to long-term archiving, data discovery and sharing. GFBio consist of a team of highly interdisciplinary partners ranging from natural history museums, botanical gardens and collections to libraries and archives. The group has taken a leading role in GFBio. Technically, we are in charge to create a data broker service for the standards compliant submission and publication of sequence and environmental data to long-term archives like INSDC and PANGAEA.

Analysis Capacities for “Big Data”: The German Network for Bioinformatics Infrastructure (de.NBI)

With the increasing amounts of data in life sciences the need for a coordinated infrastructure that offers bioinformatics services for Germany became imminent. This has been initially outlined in 2012 by the position paper of the BioÖkonomieRat. The need for structuring the bioinformatics community in Germany was further stimulated by the emerging European Infrastructure ELIXIRDe.NBI, a project of the Federal Ministry of Education and Research (BMBF), started on March 1st, 2015 with the goal to create a platform for leading bioinformatics groups that offer services for the research community. Currently, the de.NBI initiative consists of 30 partners organized in eight service centers. Together, the partners provide bioinformatics tools, appropriate hardware, and well-maintained (manually curated) databases for the exploration of large volumes of data 'Big Data' in all areas of life sciences in Germany. In August 2016, de.NBI became the German node of ELIXIR – the European Infrastructure for life-science information. As part of de.NBI, we are contributing the SILVA and SILVAngs datasets and services. We are chairing the Center for Biological Data (BioData), comprising BRENDA, PANGAEA, BacDive, EnzymeStructures and SILVA as well as the special interest group ‘Services and Service Monitoring’.

Summary

Taken together the new technological approaches and unprecedented capacities in environmental research will allow gaining new, unbiased insights into the processes driven by environmental organisms. In the long run, this will provide us the key to open the black box of microbial ecology for a global understanding of marine biodiversity, functions, interactions and dynamics. Transferring the deluge of data into biological knowledge is our primary goal.

The Ribocon GmbH

Beginning of 2005 we founded the Ribocon GmbH as a spin-off of the Microbial Genomics and Bioinformatics Research Group for knowledge transfer and product development. Our competences focus on bioinformatic analysis of genes and genomes, phylogenetic inference, the software package ARB, as well as the SILVA databases. Currently, the management team consists of: Dr. Jörg Peplies (CEO), Arno Geerds (CFO) and Prof. Frank Oliver Glöckner.


This page was last updated in March 2017

 

Back to Top