Data services

Name Description ELIXIR Node
ELIXIR Switzerland
ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Czech Republic

The microbial world provides an abundant source of biological catalysts for chemicals of medical and economic interest such as pharmaceuticals and biofuels. However, a sustainable resource for systems biology, that integrates experimental and predicted data on microbial metabolism, is still lacking. This Implementation Study defines the requirements for a European Bioinformatics resource for microbial metabolism as well as providing a practical demonstration of how existing funded resources from ELIXIR partners could be integrated to meet those needs.

This Study brings together complimentary data types:

  • microbial genomes and genes
  • protein function annotation
  • enzymatic reactions
  • metabolic pathways
  • metabolic models

These connections via RDF standards have many underpinning implications in medicine and economy such as pharmaceuticals and biofuels.

This study has now been completed, the end report is available here.

An article in the ELIXIR F1000R gateway will be made available shortly. 

Webinar summarising the outcomes

ELIXIR Switzerland, ELIXIR France, EMBL-EBI

The goal of this project is to manage the Core Data Resource and Deposition Database portfolio​.

  • Ensuring the maintenance, monitoring and ongoing selection process for the ELIXIR Core Data Resource (CDR) and ELIXIR Deposition Database (EDD) lists.
  • Implementing, and collecting annually, quantitative and qualitative indicators for ELIXIR Data Resources; aggregated reports will be generated and disseminated to stakeholders.
  • Fostering coherence in approaches for demonstrating value of life science resources via the Core Data Resource Forum.
ELIXIR Switzerland, EMBL-EBI

Supporting the establishment, and continuing the monitoring of global partnerships in the
Global Biodata Coalition and the likes and business cases for the long term sustainability of Core
Data Resources, ELIXIR Deposition Databases, and ELIXIR Community Resources.

Aim: Global partnerships for sustainable core data resources.

  1. To allow for continued monitoring of existing CDRs/EDDs and for the possibility of further calls, this task will be carried forward identical to the first 3 years of the Programme. The operational aspects of this task will continue to be resourced directly by the Hub.
  2. The Hub will develop a lightweight process for the identification and selection of community data resources; that is, resources that may not have the broad user base of Core Data Resources, but are nevertheless critical for specific communities.
EMBL-EBI, ELIXIR Switzerland

This implementation study aims to understand the existing infrastructure, resources and protocols for human genome variation annotation and curation. Work focuses on processes that can be automated to support interpretation of high-throughput genome sequencing results. The outcome will be a report that describes the current status within ELIXIR member states, identified requirements and potential solutions. The report will be part of the ELIXIR Human Genomics and Translational Data Services strategy and roadmap.

This project coordinates with ELIXIR Data Platform on surveys regarding data archives and other resources. It also consults with Compute and Tools Platforms on potential models for resourcing, scaling and providing portable tools based on the identified requirements for running data analysis workflows. The aim is also to work in close collaboration with the ELIXIR Interoperability Platform to understand the future requirements on managing variation annotation and their interpretation.

This implementation study will also aim to support the coordination between ELIXIR Human Genomics and Translational Data use case and the relevant GA4GH technical work streams. The expected outcome is a better alignment of ELIXIR activities with those in the GA4GH and direct communication with relevant resources outside of ELIXIR such as ClinVar.

ELIXIR Finland, EMBL-EBI, ELIXIR Switzerland, ELIXIR UK, ELIXIR Norway, ELIXIR Italy

Biocuration plays a key role in making research data available to the scientific community in a
standardized way. Despite its importance, the contribution and effort of biocurators is extremely
difficult to attribute and quantify. APICURON (https://apicuron.org) is a web server that provides biological databases and organizations with a real-time automatic tracking system of biocuration
activities. APICURON calculates achievements and allows objective evaluation of the volume and
quality of the contributions. In this project APICURON will be connected with the Pfam, Rfam, IntAct,
SABIO-RK, PomBase, Reactome, SILVA and BioModels databases. Different use cases and
implementations will be enumerated, and extended documentation will be produced. The aim of the project is to integrate a core of early adopter’s curation databases, standardize the tracking of curation activity and connect with ORCID. The project goal is to promote engagement and provide a service for an objective evaluation of the contributions.

ELIXIR Italy, EMBL-EBI, ELIXIR UK, ELIXIR Germany

Apple is one of the most famous fruits globally and occupies a central position in folklore, culture, and art. Apple cultivars have retained high genetic and phenotypic diversity, evidenced by the high number of apple varieties cultivated today. The economic and cultural importance of apple has driven efforts to catalogue and exploit this genetic diversity, but few of these data are currently integrated into ELIXIR resources. We propose a data implementation study to integrate the high quality apple reference genome and its associated catalogue of genetic diversity, representing the most widely cultivated apple varieties around the world. We will use apple as a case study for managing the growing number of ‘multi-genome’ fruit projects, testing and where necessary, improving tools to streamline data import and exchange between ELIXIR supported resources, specifically BioSamples, ENA, EVA, ORCAE and Ensembl Plants.

ELIXIR Italy, ELIXIR Belgium, EMBL-EBI
ELIXIR Netherlands

While the current data deluge creates a need for distributed data storage and replication, it is essential to enable data access through a single access interface.

This project aimed to integrate the raw data repositories for mass spectrometry (MS) proteomics data run by BILS (Sweden) and ProteomeXchange (via the PRIDE database, EMBL-EBI, UK), using the European infrastructure EUDAT, and served as an example to connect national data storage services and international repositories through ELIXIR. It also showed the potential of collaboration among research infrastructures and e-infrastructures to better manage the data deluge, and helped to evaluate the requirements of such federated systems.

Other Implementation Studies: 

The study is now complete, the end report is available here.

Webinar summarising the outcomes

The webinar was recorded in May 2015). See the slides.

ELIXIR Sweden, EMBL-EBI
ELIXIR Netherlands
ELIXIR Germany

To engage the Communities and increase the uptake of services and to align the Data Platform with other activities (such as FAIRplus, CONVERGE, EIP and the work of the Registries Focus Group) it is proposed to:

  1. Engage ELIXIR Communities around Data Platform services, outreach and training activities;
  2. Research Data Management, anchor Data Management Expert network (CONVERGE WP1) to the Data Platform for long-term development beyond/after CONVERGE;
  3. Develop plans around Community Data Resources, landscape analysis for ELIXIR Communities, develop recommendations;
  4. Via the Nodes, develop the data science workflows that ensure that raw and processed data from experiments are made available via Core Data Resources (CDRs), ELIXIR Deposition Databases (EDDs) and community data resources. Outreach to ensure DMPs reflect the use of CDRs, EDDs and community data resources as a priority to maximise reuse.
ELIXIR Italy, ELIXIR France, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Czech Republic, ELIXIR Sweden, ELIXIR Greece, ELIXIR Belgium

This project will be led by the ELIXIR Proteomics Community in collaboration with members of the Metabolomics Community and three ELIXIR platforms. High-throughput proteomics has become a popular choice in biological, biomedical and clinical studies and led to the development of hundreds of bioinformatics tools and data analysis pipelines. Given their large diversity, there is a urgent need to compare and benchmark different software pipelines over a large data spectrum.

This study aims to create the framework to benchmark proteomics data analysis workflows, to be built upon and improve resources from ELIXIR Tool, Data and Compute platforms by creating an interface between them linked with public proteomics data and open source stand-alone software and pipelines.

The involved data will be annotated with at least EOSC minimum information according to ELIXIR metadata standards. Our benchmarking will identify robust workflows and therefore nurture the proteomics community with high quality standards required for reproducible research and clinical applications.

ELIXIR Denmark, EMBL-EBI, ELIXIR Netherlands, ELIXIR Spain, ELIXIR France, ELIXIR Sweden, ELIXIR Italy, ELIXIR Czech Republic, ELIXIR Germany

The primary objective is to curate high-quality biochemical knowledge (reactions/enzymes/ genes) on lipid metabolism, working with lipid experts worldwide. The data will be housed in a shared resource created in partnership between two ELIXIR data services, LIPID MAPS (ELIXIR-UK), and WikiPathways (ELIXIR-NL).
To achieve this:

  • LIPID MAPS will establish expert teams representing LIPID MAPS lipid categories. Our recent “call for lipid experts” generated 50+ responses, and we have appointed expert leads to oversee data collection for 4 lipid categories.
  • For this project, initial focus will be on sphingolipids and sterols.
  • Experts will work with a curator to generate and sign-off content which will be assembled
  • using PathVisio/WikiPathways, an open access collaborative platform. The focus will be
  • mammals, but we look to include organisms such as plants, microbes, microalgae, yeast.
  • Content will be back-populated into the LIPID MAPS Structure and Reaction Databases to
  • generate an open resource for lipidomics.
ELIXIR UK, ELIXIR Netherlands

The goal of this task is to explore extending the connected ecosystem to any ELIXIR data resource, incorporating and aggregating more orphan data and human data, and providing connectivity with other elements of the ELIXIR infrastructure.

  1. Integration of the “long tail” of data into FAIR databases (e.g. supplementary materials)
  2. Build and establish aggregation databases, landscape and best practices (cf. community specific databases)
ELIXIR Switzerland, ELIXIR Italy, EMBL-EBI, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR UK

The functional annotation of enzymes is an interesting but nontrivial task requiring experimental data and scientists' manual revision for optimal results. Due to the increasing amount of structural and sequence data, it is more difficult to do the case- by-case analysis, and there is a high demand for automated solutions. One of the first attempts to collect such data was the PROCOGNATE database (Bashton et al., 2008, DOI: 10.1093/nar/gkm611) followed by the development of the Transform- MinER tool (Tyzack et al., 2018, DOI: 10.1093/bioinformatics/bty394) which searches the reactants and products in KEGG database and matches them with ligand-protein complexes structures from PDB database. The current dataset has around 150,000 cases in nearly 13,000 unique PDBs.

The current dataset's usefulness for researchers is limited mainly through two factors: 1) the database contains only basic information about the mapping, 2) it is available only as a CSV file. The first limitation will be solved by enriching the original dataset with multiple structural features, such as pockets, tunnels, and interactions, directly related to the binding and unbinding of the ligands. The calculations are already ongoing and will be finished in the following months. The second limitation will be solved by developing the web user interface, which will present the data in a complete form using 3-D structure feature visualizations.

The main aim of this project is to kick-off the database development by: 1) acquisition of the pipeline used to construct the PROCOGNATE dataset, its merge with the pipeline for structure features assessment and preparation for regular automated updates; 2) design the database structure and import all the data; and 3) design of the user interface of the database. Once these stages are finished, the user interface development will begin and will continue till approx. Q2 2022.

ELIXIR Czech Republic, EMBL-EBI
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Switzerland
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI

Work plan overview

The ELIXIR Data Platform has been successful in creating a vibrant group of people interested in all aspects of life science data, from generation and curation to storage, archiving, use and reuse.

It has achieved its first mission and defined a sustainable and well-funded collection of Core Data Resources (CDRs) that represent the gold standard on the world stage. It has more recently begun developing and defining tools to boost data integration and curation capacities and helped grow a coordinated ecosystem of Node data resources within ELIXIR.

As the demand for data, its storage and interpretation continues to grow and change, it is appropriate for the Data Platform to collectively adapt and look to the next chapter to address the technical and societal challenges of users and life sciences researchers in Europe.

Looking to the future, the Data Platform will focus on three key themes:

  1. Strengthening data connectivity and accessibility through brokering;
  2. Recognition and credit attribution for contributors and
  3. FAIRification of existing data.

It will also work to support a greater diversity of data resources by defining their context, quality and unique challenges within the ELIXIR framework.

Key to achieving this, is its ability to leverage the robust network of ELIXIR members — reachable through Platforms, Communities and Focus Groups — to connect data users, contributors and resources across Nodes while fostering collaborative discussions and leveraging technological advancements.

The Platform will deliver the services to support data resources in the life sciences through five complementary Work Packages (WPs):

Lead Partners:  SIB (Switzerland), HITS (Germany), University of Padua (Italy) 

The objective of the ELIXIR Data Platform Executive Committee (ExCo) is to manage and coordinate activities throughout the 2024-2026 work plan, ensuring smooth operation and collaboration within the Platform.

  • Monthly Coordination Calls will be organized by the ExCo to promote effective communication and collaboration among members, facilitating timely information exchange and progress updates.
  • An annual face-to-face meeting will also be arranged by the ExCo, offering a platform for detailed discussions, strategic planning, and decision-making to advance the Platform's objectives.
  • Joint Platform Activities will be facilitated by the ExCo to encourage collaboration and knowledge sharing among members, potentially combining with regular face-to-face interactions.
  • Oversight and Responsibility for implementing the work plan, monitoring progress, and ensuring timely objective completion will rest with the ExCo, holding monthly exclusive meetings for this purpose.
  • Regular reporting and progress updates will be provided by the ExCo to ELIXIR governance bodies and relevant stakeholders, promoting transparency and accountability.
  • The ExCo will actively engage with external partners and organisations for collaboration and resource-sharing, to enhance the Platform's capabilities and impact. The ExCo will evaluate Platform activities' outcomes, identifying areas for improvement and proposing necessary adjustments to enhance operational efficiency and effectiveness. 

Activity 1: Face-to-Face Meetings: Alternating between hosts and jointly with other Platforms

This activity involves organising an annual face-to-face meeting for the ELIXIR Data Platform members in loose collaboration with one of the other ELIXIR Platforms. The meetings will rotate among the ExCo and their member states as hosts.

The purpose of these meetings is to provide a dedicated platform for in-depth discussions, strategic planning, and decision-making. The ExCo will collaborate with the Hub to plan and execute the meeting effectively. The agenda will be designed to address key topics related to the Platform goals, initiatives, progress, challenges, and opportunities.

The meeting will also provide a unique opportunity for Platform members to interact in person, fostering stronger working relationships and promoting a sense of unity within the Platform and across ELIXIR. The ExCo will ensure that the meeting outcomes are documented and disseminated.

Activity 2: Monthly virtual meetings

The monthly teleconferences (TC) or virtual meetings are a crucial element of the coordination and collaboration strategy of the ELIXIR Data Platform. These regular virtual meetings will provide a forum for effective communication, information exchange and activity among Platform members. The ExCo will oversee the scheduling of these TCs and develop an agenda for each.

Lead Partners: EMBL-EBI (international organisation), VIB (Belgium)

Activity 1: Landscape of data brokering

Data brokering is the act of submitting data to one or more databases on behalf of another person/institute. It involves the coordination between data producers in an institution of an ELIXIR Node, which provides data management support, and deposition databases like the ELIXIR Deposition Database (EDD).

Moreover, data producers might use Platforms or technical services (e.g. brokerage platforms) provided by the Node to collect and prepare (meta)data for publication into databases. Often, multi-omics studies would also require references among several datasets published in different databases to preserve the relations among data (data linkage). For any of these use cases, connections are based on data exchange formats and the mapping of metadata schemas.

This task will build on and continue the work from the ELIXIR-CONVERGE project T1.2 on “Models for brokering data to ELIXIR Deposition Databases”, and in particular on multi-omics studies. The focus will be on the definition of “brokering” and landscape analysis of ELIXIR data resources and their existing requirements for data linkage and submission of multi-omics studies.

Activity 2: Defining best practices for data brokering

Data brokering can be done on different levels of operational complexity using different data exchange formats and metadata standards. In this task best practices and guidelines for data brokering will be defined to support repositories in developing or improving data submission and data linkage.

Activity 3: Implementation use cases of best practices

The implementation of best practices activity will build on the work of activity 2 (WP2.2) and the three data brokering scenarios analysed in the ELIXIR-CONVERGE project (WP1 T1.2), including the results from the BioHackathon 2022 project 27.

The work will initially explore and further develop solutions based on the ISA abstract model and its implementation as ISA-JSON, which were used in ELIXIR-CONVERGE for brokering multi-omics studies. Then, the feasibility of extending the same approach to the other scenarios will be investigated.

Following the work of activity 1 (WP2.1) and activity 2 (WP2.2), additional use cases will be selected to explore how the ISA-based solution can be expanded to support the proposed best practices for a wider range of domains/techniques and corresponding ELIXIR deposition databases. The activity will also evaluate a selection of complementary technologies (including RO-Crate) to develop a high-level technical proof of concept showcasing the feasibility and functionality of at least one of the representative use cases.

Lead Partners: University of Padua (Italy)

This WP will support the recognition and credit attribution of activities which curate, annotate or otherwise contribute to the increase of data in relevant resources such as knowledge bases. Further, it will leverage the work of APICURON to provide an infrastructural component to use in different contexts beyond more traditional biocuration activities (e.g. knowledge bases), ranging from ELIXIR registries (e.g. bio.tools, RDMkit, FAIR Cookbook, FAIRsharing), to code contributions (e.g. in GitHub) and data management/stewardship (e.g. data brokering).

The resulting service will link the mapped data citations and credits for primary contributions to citation networks (e.g. OpenAIRE KG). Moreover, it will explore the possibility to generalise the approach and employ it in other contexts such as crediting trainers in Training Platform activities. In addition to the technological component, it will explore the “sociological” implications for the “person infrastructure”, including career pathways, fostering adoption and user engagement (e.g. with gamification techniques). 

Activity 1: Technical developments for recognition 

This task will continue the technical developments for implementing recognition and credit attribution mechanisms. The APICURON platform, currently restricted to curated databases, will be expanded to cover a wider range of activities, based on user input. On-boarding of additional ELIXIR resources will lead to technical changes, as well as requests for additional features supported by this task.

A major step will be the establishment of a prototype mechanism for harvesting GitHub contributions on a small subset of selected profiles, e.g., training materials developed using the GitHub template established by the Training Platform. The visualisation of additional statistics for curators both on the APICURON website and via widgets for third-party websites will be supported based on the outputs of WP3.2. In addition, work will be carried out to create appropriate Bioschemas entities for information in APICURON in order to connect to the OpenAIRE knowledge-graph.

Activity 2: Engagement and gamification for recognition

This task focuses on the sociological aspects related to recognition and credit attribution. It will define strategies for implementing gamification effectively, through the evaluation of use cases and establishment of guidelines.

The work will span two orthogonal views, from the resources implementing recognition and credit attribution mechanisms (resource view e.g. PomBase) as well as the individual contributors (people view, e.g. curators).

The resource view will help define best practices for the granularity of events to be captured in recognition and how to attribute credit for them. The people view will instead focus on how to promote engagement via gamification, i.e. how to build meaningful statistics based on the captured events.

Activity 3: Outreach and engagement with other initiatives

This task focuses on promoting alignment with and uptake of the recognition and credit infrastructure by other initiatives, both within and outside ELIXIR (e.g. Bionomia, LifeWatch). It will create opportunities for Community engagement by presenting the recognition work at different ELIXIR meetings (e.g. All Hands Meeting, BioHackathon), and within Platforms e.g. by aligning with related work in the contemporary activities of the Training Platform.

It will also provide the necessary coordination with relevant stakeholders for alternative career assessment, including EU projects (such as STEERS, EVERSE and GraspOS) and the EOSC task force on research careers.

Lead Partners: SIB (Switzerland)

The backlog of supplementary data attached to published scientific reports, as well as generalist deposited contents, the so-called long tail of data, is a potential goldmine for research. Unfortunately, these data are buried in the contents of millions of semantically poor files. This long tail of data needs FAIR-ification using automatic methods.

Activity 1: Coordination of literature curation challenges and practices

The primary channel of communication in life science is and remains the literature. However, practices and standards for literature publication are evolving and a growing set of publications contains structured dataset descriptions (e.g., DOME) and links to specialised data repositories or general ones (e.g., Zenodo).

Further, and although CDR and Community Databases may exhibit very heterogeneous biological interests, we aim at defining some minimal data typology shared by all data resources: paper- vs. passage-centric curation, continuous vs. session-driven curation efforts, role of supplementary data or generalist repositories in curation guidelines, and named entities as in EuropePMC/SIBiLS annotations.

Activity 2: Turning the long tail of literature and supplementary data into FAIR digital objects

This subtask aims to complement the transformational efforts focusing on the FAIR-ification of literature using semantic web technologies. The idea is to leverage on-going efforts (e.g., PDF2JATS, RO-Crate, DOME, Zenodo) to both improve FAIR archiving standards and to explore how such formats can be discovered through an index or a Knowledge Graph.

First, we plan to establish a communication channel with related European initiatives. We will leverage EU research infrastructure projects, such as BiCIKL or FAIRClinical (ELIXIR-LU, CH, FR, UK), to coordinate efforts in and outside ELIXIR with lead stakeholders (e.g., CERN, LifeWatch, GBIF, GBC).

Second, the task will explore how these digital objects can be made available for discovery.

Activity 3: Accessing traceable author statements from curated databases

ELIXIR Core Data Resources and Community Databases tend to cross-reference articles to provide their end-users with access to the source of their knowledge. Unfortunately, the unit of evidence is generally the article, which may mean up to 20-30 pages of PDF.

Such granularity is often not sufficient to efficiently access an explicit traceable author statement. However, some databases and communities (e.g. System biology, Rare diseases, Biodiversity) propose to record evidence at a finer granularity. In particular GeneRiFs (Gene Reference Into Functions) and MINT/IntAct can track evidence at the level of a unique sentence. The same applies to biotic interactions.

Based on a sample of GBC, CDR and CDB (e.g., DisProt, CelloSaurus, MINT/IntAct, OLIDA), and together with WP3.1, we propose to explore how published evidence could be better captured, cross-referenced and displayed. Such a curation model will leverage methods to uniquely identify both sentences and sections in articles (e.g., Europe PMC SciLite, Biocuration Toolkit); thus evolving article and supplementary data representation standards such as JATS and BioC.

Lead Partners: SIB (Switzerland), University of Padua (Italy)

Activity 1: Leverage the interactions with the Global Biodata Coalition to support the CDR and outreach

The aim of this task is the coordination of the interactions between ELIXIR Data Platform and Global Biodata Coalition. It also includes established processes of assessment for granting CDR/EDD status and Periodic Review of existing resources.

Activity 2: Establish guidelines for Community Database identification and monitoring

This task aims to establish a robust and simple procedure for the identification and badging of ELIXIR Community Databases. This entails landscape analysis, covering all ELIXIR Databases, based on Nodes’ service delivery plans (SDPs) to establish simple minimum quality criteria for Community Databases. The identified criteria should be a checklist that does not require additional discussion or complex assessment to be applied. At the end of the process, a first iteration of the ELIXIR Community Database badge should be awarded to all data resources meeting the criteria.

Activity 3: Apply and adapt monitoring methodology for indicators

The current criteria to define ELIXIR CDR include various bibliometric indicators. Such statistical indicators include the counting of mentions, citations and database accession numbers (e.g., identifiers.org). This acitvity will produce recommendations to establish an improved list of Key Performance Indicators (KPIs) to support the evaluation of ELIXIR databases. 

ELIXIR Belgium, ELIXIR Switzerland, ELIXIR Germany, EMBL-EBI, ELIXIR Spain, ELIXIR France, ELIXIR UK, ELIXIR Greece, ELIXIR Italy, ELIXIR Norway, ELIXIR Sweden, ELIXIR Slovenia
ELIXIR Denmark
ELIXIR Germany, ELIXIR UK
ELIXIR Germany
ELIXIR Germany
EMBL-EBI
ELIXIR UK
ELIXIR UK, EMBL-EBI

The objective is to develop and deploy an “ELIXIR Contextual Data Clearinghouse (clearinghouse)” for extending, correcting and improving publicly available annotations on records in sample and sequencing data resources.

Contextual data is fundamental for FAIR data in ELIXIR. So far, little attention has been paid to connect and exchange curated contextual data to improve the quality of primary and secondary and data resources within the metagenomics domain. In this proposal, we will build a “clearinghouse” to allow seamless exchange of contextual data between ELIXIR data resources.

The project will strengthen the collaborations between these ELIXIR resources, build synergies to improve the quality and impact of the content and, not least, build more sustainable data resources. The proposed project will be an excellent showcase on how the outcomes of the EXCELERATE Marine Metagenomics Use Case, together with established and new ELIXIR data resources, can improve the quality and impact of publicly available data, especially towards the marine domain.

ELIXIR Norway, EMBL-EBI, ELIXIR Germany, ELIXIR Italy

An ELIXIR implementation study started in February 2017, as a collaboration between EMBL-EBI and ELIXIR-DE. Its main objective is to develop open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database (an ELIXIR core data resource) and to deploy these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Building on this work, we here propose a follow-up project that has three objectives: 

  1. The inclusion of additional open tools developed by other ELIXIR nodes
  2. The improvement of the overall infrastructure supporting the implementation of proteomics data analysis pipelines
  3. The inclusion of quality control pipelines.

The overarching goal is that these tools can be deployed in other cloud infrastructures, and can be easily reused by anyone in the community, thus bringing the users closer to the tools, and the tools closer to the data.

Impact of the study

The outcome will be that an increased range of open proteomics tools will be included in an extended range of cloud infrastructures, including new quality control features based on OpenMS. Impact – increased facility for proteomics analysis across multiple cloud platforms – all with increased degree of quality control.

ELIXIR Belgium, EMBL-EBI, ELIXIR Germany, ELIXIR France, ELIXIR Spain

The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reusability of digital resources. While the principles have enjoyed rapid uptake across communities (ELIXIR, G20, EOSC, H2020, NIH), the implementation details remain unclear.

Recently, we developed a prototype software infrastructure and a set of metrics to assess the FAIRness of digital resources (http://fairmetrics.org/). In this ELIXIR Implementation Study we will put these into practice for the ELIXIR community by starting to FAIRify ELIXIR Core Data Resources ArrayExpress, ENA, PDBe, PRIDE, CatH, CHEMBL, ChEBI, UNIPROT, HPA, INTERPRO, MINT, and STRING-db.

Our study will first establish effective guidelines for implementation, then involve hands-on FAIRification workshops, in which FAIRness will be assessed before and after the work done. Our work will raise awareness around what it takes to be FAIR, and to help drive interoperability between core ELIXIR resources and with efforts outside of ELIXIR.

ELIXIR Netherlands, ELIXIR UK, EMBL-EBI, ELIXIR Italy, ELIXIR Sweden

Recent progress in sequencing technologies has produced several large scale genotyping data sets for crops. The insights afforded by this data have been published in high profile scientific articles, but the underlying raw genotype data and the associated sample and population metadata have not been routinely submitted to appropriate archives.

The aim of this implementation study, led by the ELIXIR Plant Community and in coordination with the ELIXIR Interoperability Platform and Data Platform, is to provide this wealth of data according to FAIR principles. It will ensure an interoperable link with the phenotypic data that is stored in distributed institutional repositories which is crucial for excelerated crop breeding.

We propose to create a sustainable toolbox to submit data to the ELIXIR Deposition Database “European Variation Archive” (EVA) and enrich the data with interoperable metadata regarding plant data standards like “Multi-Crop Passport Descriptor” (MCPD) and “Minimum Information About a Plant Phenotyping Experiment” (MIAPPE).

ELIXIR France, ELIXIR Germany, ELIXIR Belgium, ELIXIR Netherlands, EMBL-EBI

The long-term sustainability of the databases within ELIXIR is a constant concern for those who are managing them, with only a very small minority having secured funding over 5 years or more. The NIH grant that is currently funding a large proportion of UniprotKB/Swiss-Prot is guaranteed until 2018. At the same time, millions of life science users in Europe and beyond rely on these resources for their everyday research. 

The aim of this Implementation Study is to review sustainable funding models for knowledge bases (Academic, Commercial or third party), with UniprotKB/Swiss-Prot as a specific use case. Consideration was given to a number of approaches, all of which would need to meet certain criteria: 

  • Open access and equal opportunity
  • Generate stable revenue sufficient to cover costs
  • Transparent sources
  • A number of revenue streams (reduce risk) 

This study has now been completed, the work is presented in a webinar and summarised comprehensive paper: Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case [version 1; referees: awaiting peer review]. F1000Research 2017, 6(ELIXIR):2051 (doi: 10.12688/f1000research.12989.1)

Webinar summarising the outcome

ELIXIR Switzerland

This project will increase interoperability between four ELIXIR resources (CATH, SWISS-MODEL, InterPro and PDBe), three of which are Core Resources, by building APIs that facilitate the import and export of data between them.

The ultimate goal is to improve provision of 3D-Models for protein domain sequences via CATH, SWISS-MODEL and InterPro. Less than 10% of known sequences have experimentally characterised 3D structural information and yet this data is often essential for understanding the protein’s molecular function and biological role and for determining whether residue mutations could damage the protein and lead to disease. So this integration is very timely as it will enhance links between sequence and structure data.

APIs will be built using well-established protocols and as well as promoting interoperability, and therefore sustainability, we will expand the data in each resource to ensure they serve a wider community of biologists.

ELIXIR UK, ELIXIR Switzerland, EMBL-EBI

This project will increase interoperability between four ELIXIR resources (CATH, SWISS-MODEL, InterPro and PDBe), three of which are Core Resources, by building APIs that facilitate the import and export of data between them.

The ultimate goal is to improve provision of 3D-Models for protein domain sequences via CATH, SWISS-MODEL and InterPro. Less than 10% of known sequences have experimentally characterised 3D structural information and yet this data is often essential for understanding the protein’s molecular function and biological role and for determining whether residue mutations could damage the protein and lead to disease. So this integration is very timely as it will enhance links between sequence and structure data.

APIs will be built using well-established protocols and as well as promoting interoperability, and therefore sustainability, we will expand the data in each resource to ensure they serve a wider community of biologists.

ELIXIR UK, ELIXIR Switzerland, EMBL-EBI

The implementation study project plan of ELIXIR Italy consists of six activities that aim to boost the cooperation with existing ELIXIR activities and are expected to deepen the interaction between ELIXIR-IIB, the Joint Research Unit embodying the Italian Node, and ELIXIR. The partners involved have already established contacts with other ELIXIR Nodes and the relevant ELIXIR Platforms and Services in order to ensure an advantageous outcome for all the involved parties. The goal of the proposed activities is to create and/or reinforce collaborations based on concrete measures. With this implementation study the Italian ELIXIR Node will achieve greater integration within ELIXIR service infrastructures and data interoperability policies. The topics of the selected activities and an additional coordination task are summarized below:

  1. Integration in ELIXIR Bioschemas activities.
  2. Integration in ELIXIR Data Curation activities.
  3. Integration in ELIXIR Galaxy activities through a project on practical feasibility of creating and running large-scale Galaxy-based variant calling pipelines on microservice infrastructures.
  4. Integration in ELIXIR Human Data activities through Beacons.
  5. Integration in ELIXIR Marine Metagenomics activities through a web-service supporting ITS1-based survey of marine communities.
  6. Integration in ELIXIR Rare Diseases activities.
  7. Coordination of the Italian ELIXIR Node Implementation study project.
ELIXIR Italy

The implementation study project plan of ELIXIR Italy consists of six activities that aim to boost the cooperation with existing ELIXIR activities and are expected to deepen the interaction between ELIXIR-IIB, the Joint Research Unit embodying the Italian Node, and ELIXIR. The partners involved have already established contacts with other ELIXIR Nodes and the relevant ELIXIR Platforms and Services in order to ensure an advantageous outcome for all the involved parties. The goal of the proposed activities is to create and/or reinforce collaborations based on concrete measures. With this implementation study the Italian ELIXIR Node will achieve greater integration within ELIXIR service infrastructures and data interoperability policies. The topics of the selected activities and an additional coordination task are summarized below:

  1. Integration in ELIXIR Bioschemas activities.
  2. Integration in ELIXIR Data Curation activities.
  3. Integration in ELIXIR Galaxy activities through a project on practical feasibility of creating and running large-scale Galaxy-based variant calling pipelines on microservice infrastructures.
  4. Integration in ELIXIR Human Data activities through Beacons.
  5. Integration in ELIXIR Marine Metagenomics activities through a web-service supporting ITS1-based survey of marine communities.
  6. Integration in ELIXIR Rare Diseases activities.
  7. Coordination of the Italian ELIXIR Node Implementation study project.
ELIXIR Italy

Epitranscriptome modifications are now emerging as important factors to fine tune gene expression and regulation. Among them, A-to-I RNA editing by ADAR enzymes plays relevant biological roles and has been linked to several human diseases. Thanks to deep transcriptome sequencing data, A-to-I events have been characterized at single nucleotide level and collected in the REDIportal database, a unique and specialized resource comprising about 16 millions of changes detected in more than 9000 human GTEx RNAseq data.

Here we plan to upgrade REDIportal providing researchers an accurate, sustainable and accessible epitranscriptome resource through its integration into the ELIXIR ecosystem. Such integration will be established through a standardization, curation and “FAIRification” of data in combination with interconnections to existing ELIXIR resources such as Ensembl, UniProt, RNAcentral and PRIDE. Our proposal will facilitate data interoperability and the study of epitranscriptome, a very relevant research topic yet under-represented in the ELIXIR community.

ELIXIR Italy, EMBL-EBI, ELIXIR Israel

Comparison of environmental sequences to reference sets from curated marker loci provides a mainstay for taxonomic analysis of microbial communities. Microbial eukaryotic sequencing requires many distinct reference sets to cover diversity adequately. Those producing reference sets follow different curation workflows, but share the need to provide their data onwards to a common set of tools and services, such as EMG, Megan, MetaPIPE and BioMaS.

There are multiple inefficiencies:

  • reference set providers must build services to sustain and feed their data to consumer tools and services
  • consumers must import reference sets from several sources with different formats.

Led by the ITSoneDB team, who provide the leading fungi and other eukaryotes ITS1 reference set, we will develop a new data type within ENA that will capture systematically these reference sets and serve them to dependent resources, eliminating inefficiencies, leveraging this core ELIXIR resource and building sustainability into reference set generation workflows.

Currently, taxonomic analysis of microbial communities relies on multiple dispersed reference data sets.  The impact of this study will be that ENA will be enriched with a new structured data type to accommodate these taxonomic reference datasets, beginning with ITS1 from rRNA, from the ITSoneDB team.  

By enhancing the connectivity and coordination between the various reference datasets and ENA a stable system to systematically capture their data and serve them to the consumer services from one place will be made available. This will increase both the sustainability and exposure of the data and facilitate/promote their use and re-use.

ELIXIR Italy, EMBL-EBI

Intrinsically disordered proteins (IDPs), characterized by high conformational variability, cover almost a third of the residues in Eukaryotic proteomes. As major players in cellular regulation, IDPs are involved in numerous diseases.

Specialized IDP databases provide a starting point for analysis, yet their integration into core databases remains very limited. Here, we propose to start integrating IDP information into ELIXIR Core Data Resources.

This will be achieved with a three pronged approach:

  1. Integration of an expanded version of MobiDB-lite into UniProtKB and PDBe including links back to MobiDB via the InterPro infrastructure.
  2. Creation of a minimal information about IDP experiments (MIADE) standard and data interchange format will help ensure interoperability between databases with manually curated IDP data.
  3. Sustainability of curated IDP data will be enhanced by creating PubDisProt, a new deposition resource for linking curated literature and protein identifiers across databases.
ELIXIR Italy, EMBL-EBI, ELIXIR Switzerland, ELIXIR Hungary, ELIXIR Ireland

Reactome is a world-leading, curated resource for biomolecular pathways, with >1,200 citations in 2019 and >80,000 distinct users monthly. It is developed in international collaboration with EMBL-EBI as one of four partners. Reactome’s content is presented through a multi-scale visualisation system, complemented by advanced analysis tools. Beyond scientific analysis, Reactome is uniquely suited for teaching and training in molecular biology as well as in bioinformatics, through its open source, open data policy. For its content, Reactome critically depends on external domain experts who, in collaboration with professional Reactome curators ensure consistent, high quality of the curated pathways.

At University of Ljubljana Faculty of Medicine within the broader group of the Slovenian ELIXIR Node, there is ample domain expertise in a variety of multifactorial disorders and pathways beyond that. For example, we are experts in cholesterol homeostasis, particularly in sterols of the cholesterol synthesis pathway, which were recently found as ligands of nuclear receptor RORC. The expertise is also in the cholesterol metabolism and connections to liver pathologies, the circadian clock, Epo receptor and signalling, molecular aspects of cancer, epigenetics of cancer and brain disorders, in cytoskeleton abnormalities, etc.

The aim of the LEAP project is to establish staff exchange and regular interaction between ELIXIR Slovenia and EMBL-EBI, supporting expert curation and user experience testing of Reactome by ELIXIR Slovenia domain experts, and establish routine use of Reactome for data analysis, teaching, and training in Slovenia. This is expected to lead to improved Reactome user interface and content, and advanced pathway analysis for high throughput biomolecular data in the Slovenian research community.

ELIXIR Slovenia, EMBL-EBI

The goal of this group is to explore extending the connected ecosystem to any ELIXIR data resource with the scientific literature, with a view to incorporating more orphan data and human data, and to providing connectivity with other elements of the ELIXIR infrastructure.

Aim: To increase understanding of the potential for benefits that would arise from increasing the number of ELIXIR data resources linked to each other, Europe PMC, and integrated with orphan data, where appropriate. 

 Through presentations, webinars, hackathons and staff exchange we will explore:

  • Comprehensive cross-linking at deep level between data resources, Europe PMC and researchers
  • Analytic services and API use for citation data, and crosslinks
  • Methods to deep-link curation statements
  • Integration of orphan data related to ELIXIR data resources and/or Data Management Plans via the BioStudies database
  • Supporting Interoperability Platform in initiatives to encourage schema.org and BioSchema adoption in CDRs.
EMBL-EBI, ELIXIR Italy, ELIXIR Czech Republic, ELIXIR France, ELIXIR UK, ELIXIR Spain, ELIXIR Sweden, ELIXIR Germany, ELIXIR Switzerland

This study will support the establishment of global partnerships and business cases for the long term financial sustainability of Core Data Resources.

Aim: To establish global partnerships for sustainable core data resources.

  • Contribute to the establishment of a global, internationally shared, sustainable funding model for Core Data Resources.
  • Share the experience gained with the European life science data infrastructure from the ELIXIR Core Data Resource selection process, as considerations of global priorities and resource allocation proceed.
  • Influence the development of Data Management Plans to ensure best practice and adoption of ELIXIR Core Data Resources and ELIXIR Deposition Databases.
ELIXIR Switzerland, EMBL-EBI, ELIXIR Norway

This implementation study is designed to

  1. Further identify communities of curators within ELIXIR, with the aim to reach out to those who may not initially identify with this role;
  2. Review and map the type of curation work being done, for which databases across ELIXIR, and in which life science/health domains;
  3. Identify the tools they are currently using for curation tasks;
  4. Assess the capacity requirements for new curators and training provision available;
  5. Produce an initial map of biocurators, manually curated knowledgebases and data repositories, curation tools / resources covering a range of topics and domains;  this will ensure that these resources are added to FAIRsharing as appropriate, providing an interoperable community resource mapping the landscape of biocuration and biological databases across Europe and the ELIXIR Nodes;
  6. Link training for biocuration resources to training available in TeSS and identify gaps in training provision.

The outcomes of this study have been published in F1000Research and were presented in a webinar.

Part of this work has been presented in the ISB Biocuration 2019 meeting (Cambridge, UK, April 2019):

  1. Holinski A, Burke M, Morgan S, Palagi P and McQuilton P.  Mapping the landscape of biocuration – where are the biocurators and what do they need? F1000Research2019, 8:735 (slides)
  2. McQuilton P.  FAIRsharing.org – mapping the landscape of databases, standards and policies (describing, linking and assessing their FAIRness) F1000Research2019, 8:712 (slides) 
  3. Palagi PM, McQuilton P and Beard N. TeSS: the ELIXIR training portalF1000Research2019, 8:709 (slides) 

and also in the ELIXIR All-Hands Meeting 2019 (Lisbon, Portugal, June 2019):

  1. Holinski A, Burke M, Morgan S, Palagi P and McQuilton P. Mapping the landscape of biocuration – where are the biocurators and what do they need? F1000Research2019, 8:1519 (slides)

This work is carried out in collaboration with 

ELIXIR UK, ELIXIR Switzerland, EMBL-EBI, ELIXIR Luxembourg, ELIXIR Slovenia

This implementation study is designed to

  1. Further identify communities of curators within ELIXIR, with the aim to reach out to those who may not initially identify with this role;
  2. Review and map the type of curation work being done, for which databases across ELIXIR, and in which life science/health domains;
  3. Identify the tools they are currently using for curation tasks;
  4. Assess the capacity requirements for new curators and training provision available;
  5. Produce an initial map of biocurators, manually curated knowledgebases and data repositories, curation tools / resources covering a range of topics and domains;  this will ensure that these resources are added to FAIRsharing as appropriate, providing an interoperable community resource mapping the landscape of biocuration and biological databases across Europe and the ELIXIR Nodes;
  6. Link training for biocuration resources to training available in TeSS and identify gaps in training provision.

The outcomes of this study have been published in F1000Research and were presented in a webinar.

Part of this work has been presented in the ISB Biocuration 2019 meeting (Cambridge, UK, April 2019):

  1. Holinski A, Burke M, Morgan S, Palagi P and McQuilton P.  Mapping the landscape of biocuration – where are the biocurators and what do they need? F1000Research2019, 8:735 (slides)
  2. McQuilton P.  FAIRsharing.org – mapping the landscape of databases, standards and policies (describing, linking and assessing their FAIRness) F1000Research2019, 8:712 (slides) 
  3. Palagi PM, McQuilton P and Beard N. TeSS: the ELIXIR training portalF1000Research2019, 8:709 (slides) 

and also in the ELIXIR All-Hands Meeting 2019 (Lisbon, Portugal, June 2019):

  1. Holinski A, Burke M, Morgan S, Palagi P and McQuilton P. Mapping the landscape of biocuration – where are the biocurators and what do they need? F1000Research2019, 8:1519 (slides)

This work is carried out in collaboration with 

ELIXIR UK, ELIXIR Switzerland, EMBL-EBI, ELIXIR Luxembourg, ELIXIR Slovenia

Metabolomics aims to provide novel insights into the biochemical reactions of organisms by characterising the presence and concentrations of low molecular weight compounds from biological samples. The primary analytical tools for such high-throughput data collection are mass spectrometry (MS), often preceded by chromatographic or electrophoretic separation technologies, and nuclear magnetic resonance spectroscopy (NMR).

These technologies produce relatively large and complex data sets that require bioinformaticians, cheminformaticians, biostatisticians, data scientists and computer scientists. Together they develop and apply a wide range of algorithms, software tools, repositories and computational resources to process, analyse, report and store the data and metadata.

Increasingly, insights from genomics, epigenomics, transcriptomics, proteomics/protein interactomics and metabolomics are combined, to gain insights into the dynamics of biological processes. Metabolomics activities are well represented within Europe and ELIXIR nodes. Metabolite identification is the area that the community believes will have maximal impact of computational metabolomics and metabolomics data management and will benefit most from interactions with the existing five ELIXIR platforms and where progress will contribute most to other ELIXIR communities.

The progress through this integrative Implementation Study will benefit industry and academia alike as metabolite identification is one of the major bottlenecks in metabolomics and resolving this challenge requires a community effort.

ELIXIR Netherlands, EMBL-EBI, ELIXIR France, ELIXIR UK, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, ELIXIR Italy, ELIXIR Estonia, ELIXIR Switzerland, ELIXIR Belgium

The project developed robust and automated open analysis pipelines for MS/MS proteomics data (based on the OpenMS framework, including new quality control features) that can be deployed in a cloud environment and reused openly by the scientific community in the future. A feature of this project was the building of a Proteomics Community, bringing people together for a face-to-face meeting (March 2017) and a hackathon (Jan 2018). 

This is rapidly developing field with limited or non-existent standards in the technical platforms leading to a dgeree of instability (See F1000R paper). Details as to how these concerns might be addressed are set out in the end report

The outcome will be that an increased range of open proteomics tools will be included in an extended range of cloud infrastructures. Proteomics data from the PRIDE Archive database was used in a pilot project to demonstrate the usefulness of the resource. The result being an increased facility for proteomics analysis using pipelines deployed across multiple cloud platforms. 

Other Implementation Studies:

Webinar summarising the outcome

(Rec. April 2018); see also the slides .

EMBL-EBI, ELIXIR Germany
ELIXIR Switzerland
ELIXIR France
ELIXIR Switzerland, EMBL-EBI

In the future, the research literature will be increasingly open access, with new communication mechanisms such as preprints requiring versions management and new peer review mechanisms. Managing full text article corpora for text mining will be much more challenging than managing just abstracts, and it is unlikely that each and every text mining group will want to invest the necessary time and effort when there are public resources already available. Bringing the compute to the data is commonplace in most informatics workflows, and there is no reason why text mining operations will be different in the long term.

The process of curation, performed by expert biologists, is the life-blood of knowledgebases. Curators need to identify key papers, read the full text of the articles to weigh up the evidence, then extract the most pertinent information. A growing corpus of open access full text articles provides new opportunities to enhance article triage and browsing systems. At the same time, many text mining workflows are mature enough to support curation activities.

This group of tasks aims to build community and infrastructure based on the open full-text research literature. By providing a platform for doing text mining and sharing the outputs, developing standards, and then combining the semantic enrichment with rich article metadata and software tools, we expect to provide scalable support for curation across multiple knowledgebases.

Aim: Maximise support for human curation.

This group will develop the infrastructure around full text article resources to support curator workflows. This will be done by semantically enriching research articles and exploring the development of article triage systems as infrastructure. For example, daily text mining of biological concepts from full text research articles and sharing the annotations for use in search, triage, and crosslinking. The opportunities and role for community curation will also be explored.

EMBL-EBI, ELIXIR Switzerland, ELIXIR Italy, ELIXIR Norway, ELIXIR Portugal, ELIXIR Czech Republic, ELIXIR Luxembourg, ELIXIR France, ELIXIR UK, ELIXIR Spain, ELIXIR Sweden, ELIXIR Germany

Most data and literature curation processes are initiated via some entity-centric query (e.g. gene or gene products, a disease, a chemical compound). However, most databases are also interested in accessing and curating contents using other types of modalities: some biological phenomena (e.g. Intrinsically Disordered Proteins) or some domain-specific aspects of biology (e.g. lipidomics, glycomics, rare diseases). These are not easily expressed via a combination of
keywords; therefore non entity-centric literature exploration tools are needed.

Further, literature curation will expand beyond abstracts to include full-text, supplementary data and pre-prints (in several versions). This will be the focus of Task.3 in 22-23.

  1. Literature screening and alerting services to speed up curation: unlike most curation-support tools, which are entity centric (i.e. curation workflows start with a gene), this WP aims at triggering an alert as soon as a newly relevant article for a given database is published. 
  2. Triage-as-a-Service for any published data, including full-text, pre-prints and supplementary data: Expand triage services to full-text articles and supplementary data (in connection with T2.1).
  3. Ongoing community requests for additional connectivity between ELIXIR data resources and the literature are expected in the period 22-23, and this WP will be carried forward as is.
  4. Monitor and assess the ongoing Data Curation Implementation studies that arise out of the 2021 RFP process.
ELIXIR Switzerland, EMBL-EBI, ELIXIR France, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Belgium, ELIXIR Norway

Documented regulatory interactions between Transcription Factors (TFs) and target genes (TGs) form a crucial resource for biological network building. The ExTRI text mining project yielded about 54,000 sentences from literature that proposedly describe a TF-TG interaction, linked to database identifiers. This corpus is now a compelling target for a community curation effort, which will be performed by dedicated curators in this project, and modelers potentially contributing via Cytoscape. To make this large-scale effort possible, we will combine the intuitive and flexible curation tool VSM (Visual Syntax Method), with automated NLP (Natural Language Processing) assistance, into a community curation app. Premarked ExTRI sentences will be shown in their abstract, along with NLP annotations provided by Europe-PMC's API, to help curators formalize knowledge into a form suitable for computational analysis. The new web app will be designed for reusability, guided by professional curation partners, and evaluated for effect on curator performance.

ELIXIR Norway, ELIXIR Luxembourg, ELIXIR Switzerland

Health research is advanced through a deeper understanding of disease aetiology provided by detecting associations between genetic variants and disease traits in population samples. The GWAS Central and DisGeNET data services provide extensive gene/variant-phenotype/disease associations. However, an absence of tools and resources to support the text mining of comprehensive data sources prevents scalable import of association data, which is currently limited to text mining abstracts or requires manual curation.

This project will extend and integrate the participants’ existing text mining tools to provide a reusable workflow to extract human genotype-phenotype associations from scientific literature full-texts, tables and supplementary materials. These data will be imported into GWAS Central and DisGeNET, accelerating FAIR access to pioneering findings such as COVID-19 GWAS. The development of an annotated GWAS corpus based on full-text articles will enable the evaluation of existing and future text mining methodologies for extracting genotype-phenotype associations and metadata.

ELIXIR UK, ELIXIR Spain

This Implementation Study shaped the development of a distributed model for Ensembl, whereby multiple ELIXIR Nodes use common infrastructure to provide an integrated service to users, each focused on their own areas of interest and expertise.

This approach will increase the quality of available data, and simplify data access for users by allowing the participation of many Nodes in a single service. The study is a natural complement to Task 10.3 "Capacity Building in Genome Assembly and Annotation" in the EXCELERATE project.

The work led to the development of a global species registry employing non-overlapping identifier spaces which will increased facility for genome analysis and customisation of Ensembl to target organisms of relevance. Following this project, the software is more sustainable through improved structures and documentation. This will benefit Ensembl internally, as well as external parties (ELIXIR Nodes, Life Science researchers) who want to create their own Ensembl instance. 

This study is now complete, see the end report.

Webinar summarising the outcomes

ELIXIR Norway, EMBL-EBI, ELIXIR Sweden

The ELIXIR Core Data Resource ChEBI is a dictionary of molecular entities and, currently, is able to handle all possible forms of a given chemical structure (E.g. neutral, tautomeric, protonated, isotopic, zwitterionic forms). Each form is assigned its own unique ChEBI identifier, which enables inter-relations within ChEBI via the ontology. 

Within the biological community, different database resources represent chemical structures in different ways. The ELIXIR resource, Rhea, uses only the physiological pH 7.3 form of a given molecule, whereas Reactome requires both neutral and protonated forms of a given molecule. This inconsistency between resources in the mapping of chemical structures was highlighted in a recent ChEBI user workshop held in May 2019 (EMBL-EBI, UK), where it was agreed that a unification of the methodology, and consistency in the mapping of chemical structures would allow easier mapping across the different ELIXIR resources.

Here we propose a set of meetings and working groups between the resources ChEBI (EMBL-EBI) and Rhea (SIB), with the aim of streamlining the link between chemicals (ChEBI), reactions (Rhea) and pathways (Reactome) and to provide guidelines on the representation of chemical entities to the metabolic modelling community. We anticipate that the development of closer interactions and working relationships between core staff from the different resources during the course of this project will enable further streamlining and increased levels of interoperability between these ELIXIR resources. It will lead to a greater degree of understanding of the
underlying workflows of each resource and to more efficient coordination of their development.

EMBL-EBI, ELIXIR Switzerland