With the publication of the
MIGS/MIMS specification the Genomic Standards Consortium (GSC) has finished the groundwork to enrich our genome and metagenome collections with additional contextual data (See:
publication in Nature Biotech). It is now possible to extend and adapt this specification to any genetic marker sequence retrieved from the environment.
To move forward and leverage existing interest in the community a proposal for MIMARKS, the Minimum Information about a MARKer gene Sequence, has been accepted as a natural extension to MIGS and MIMS at the 6th GSC meeting in October 2008.
MIMARKS is meant to be fully compliant with the attributes already included in the MIGS/MIMS specification, but also adds additional contextual data fields that are needed to enrich our ever growing set of marker gene sequences.
If the community can supplement the marker gene sequence collections with more contextual data, there will be the ability to retrieve, search and analyse these invaluable and ever-growing datasets in unprecedented detail; for example by selecting all 16S sequences related to specific environmental parameters (i.e. location, habitat, temperature, salinity, oxygen concentration etc.).
There are five key aspects that need to be tackled by the MIMARKS working group:
I provide "bio-curation" power for the
SILVA,
ARB and
megx.net projects.
SILVA & ARB
In SILVA, I am responsible for bacterial, archaeal, and to some extent eukaryotic, taxonomy curation in
SILVA Ref datasets. The taxonomic information is taken by
Bergey’s Taxonomic Outline of the Prokaryotes, and the
List of Prokaryotic names with Standing in Nomenclature to supplement the Bergey’s taxonomic outlines with the latest information of validly described bacterial and archaeal taxa. Furthermore, I spend extensive efforts to represent prominent uncultured, and not-validly published environmental clades, groups, and taxa, respectively. The majority of these clades and groups are annotated in the guide tree for the SSU Ref dataset based on literature surveys and personal communications. Taxonomic groups consisting only of sequences from uncultured organisms are named after the clone sequence submitted earliest.
I am actively involved in testing of new ARB versions. Additionally, I am also involved with ARB-support, by tending to internal and external requests.
megx.net
In megx.net, I integrate 16S/18S and 23S/28S rRNA sequence data by linking SILVA with megx.net, as well as providing support for conceptual development.
I have various meta-analysis projects of the
Global Ocean Sampling (GOS) expedition metagenome. The first one centers on the systematic analysis of 23S rRNA gene sequences in unassembled reads, with a focus on the amount of retrieved 23S rRNA fragments, fragment length distribution, and high level taxonomic classification of the fragments. Additionally, an evaluation of previously reported 23S rRNA primers and probes is being carried out on this extended dataset.
The second project involves the study ecological structuring of bacterial and archaeal taxon ranks. Community structures, based on taxonomically classified 16S ribosomal RNA (rRNA) gene fragments at phylum, class, order, family, and genus rank levels, are examined using multivariate statistical analysis methods and the results are inspected in the context of oceanographic environmental parameters, and structured habitat classifications, such as the
Environment Ontology (EnvO).