Annotation and curation of human genomic variations (2018-Variations)

This implementation study aims to understand the existing infrastructure, resources and protocols for human genome variation annotation and curation. Work focuses on processes that can be automated to support interpretation of high-throughput genome sequencing results. The outcome will be a report that describes the current status within ELIXIR member states, identified requirements and potential solutions. The report will be part of the ELIXIR Human Genomics and Translational Data Services strategy and roadmap.

Metabolite Identification

Metabolomics aims to provide novel insights into the biochemical reactions of organisms by characterising the presence and concentrations of low molecular weight compounds from biological samples. The primary analytical tools for such high-throughput data collection are mass spectrometry (MS), often preceded by chromatographic or electrophoretic separation technologies, and nuclear magnetic resonance spectroscopy (NMR).

FAIRification of Genomic Tracks

We propose to advance the application of FAIR principles to metadata for human genomic tracks by developing recommendations for metadata as well as algorithmic tools, to apply the recommendations to tracks from selected hubs associated with the Ensembl TrackHub Registry, to implement a track search service that integrates metadata from different track hubs, and to test the implementation with selected track-oriented analytical tools. The recommendations will form a basis for developing a standard for metadata on genomic tracks.

Bioschemas: Community Adoption and Training

Bioschemas (http://bioschemas.org) is a community initiative which aims to improve data discoverability in the life sciences and provide better exposure of our data repositories, including the ELIXIR Core and Node Data Resources, to generic search engines, such as Google, and domain specific  repositories such as Identifiers.org, FAIRsharing.org, and DataMed. It does this by encouraging content providers in life sciences to use Schema.org markup to expose consistent structured data in their websites.

Integrating reference taxonomic databases for metabarcoding and metagenomics identification

Comparison of environmental sequences to reference sets from curated marker loci provides a mainstay for taxonomic analysis of microbial communities. Microbial eukaryotic sequencing requires many distinct reference sets to cover diversity adequately. Those producing reference sets follow different curation workflows, but share the need to provide their data onwards to a common set of tools and services, such as EMG, Megan, MetaPIPE and BioMaS.

There are multiple inefficiencies:

Extending open proteomics data analysis pipelines in the cloud: Additional tools and focus on scalability, supporting the dramatic growth of public proteomics data

An ELIXIR implementation study started in February 2017, as a collaboration between EMBL-EBI and ELIXIR-DE. Its main objective is to develop open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database (an ELIXIR core data resource) and to deploy these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Building on this work, we here propose a follow-up project that has three objectives: 

Integration and standardization of intrinsically disordered protein data (2018-IDPs)

Intrinsically disordered proteins (IDPs), characterized by high conformational variability, cover almost a third of the residues in Eukaryotic proteomes. As major players in cellular regulation, IDPs are involved in numerous diseases.

Specialized IDP databases provide a starting point for analysis, yet their integration into core databases remains very limited. Here, we propose to start integrating IDP information into ELIXIR Core Data Resources.

FAIRness of the current ELIXIR Core resources: Application (and test) of newly available FAIR metrics, and identification of steps to increase interoperability (2018-FAIRCDR)

The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reusability of digital resources. While the principles have enjoyed rapid uptake across communities (ELIXIR, G20, EOSC, H2020, NIH), the implementation details remain unclear.