Tools services

Name Description ELIXIR Node
ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Netherlands
ELIXIR Czech Republic

Through this project, we will continue building on the DOME recommendations, expanding them in scope, while also creating a concrete framework to increase adoption and overall impact of the DOME recommendations, looking both within ELIXIR and beyond.

The project will engage in the following three complementary activities:

  1. Implement a framework around the DOME recommendations, including a registry to capture DOME-related information from existing and future literature, and a low-barrier software tool that will allow for the easy creation of DOME annotation by the wider community.
  2. Review and connect the DOME recommendations in light of the particular needs, requirements and expectations of relevant ELIXIR communities, thus establishing a pathway towards adoption. 
  3. Create a clear engagement plan to stakeholders beyond ELIXIR, linking to relevant global efforts and initiatives, including the RDA ML interest groups, the NIH AI activities and efforts in Industry (such as the Pistoia Alliance).
ELIXIR Greece, ELIXIR Italy, ELIXIR UK, ELIXIR France, ELIXIR Ireland, ELIXIR Estonia, ELIXIR Belgium, ELIXIR Sweden, EMBL-EBI, ELIXIR Luxembourg, ELIXIR Denmark, ELIXIR Spain

The provision of tools and services for the life sciences is highly distributed. There are very many providers ranging from large entities including service organizations well geared for software development, to individual scientists with limited technical expertise and resources.

There is often little coordination of the scientific scope, description, use or interoperability of the software. The picture is extremely fragmented and in many cases scientists must trawl the Web
in their search for adequate tools, or rely on the advice of colleagues to comprehend the diverse offerings.

Thus, the creation of a comprehensive, consistent and searchable registry should have a major positive effect on tool utilization by the life sciences community. The current work on creation of such a registry is the principal contribution of the Danish Node to ELIXIR as well as a contribution of ELIXIR as such to the BioMedBridges joint effort on the ESFRI roadmap.

The study is now complete, this work was picked up in a subsequent study

The aim of this implementation study is to provide a stable infrastructure for unifying software containers solutions within ELIXIR. This infrastructure will provide an access point for end-users to find, generate, store, monitor, and even benchmark software containers solutions. Hardware infrastructure will be provided by an ELIXIR Node from the ELIXIR Compute Platform for software containers deployment while ELIXIR-ES will provide the backup system using EUDAT protocols and infrastructures. In the long-term this registry could become a relying service to the ELIXIR AAI allowing infrastructures to manage users accounts.

The impact of this infrastructure will be demonstrated across ELIXIR Platforms and Use Cases. Software containers are a key technology which enables the rapid deployment of software resources including workflows across a variety of systems e.g. HPC, Cloud environments, and local computers; and the connection with existing database repositories. Additionally, this technology will be used to support training activities carried out by ELIXIR, where trainers will be able to focus on the training content rather than in the technological framework of the training, during face to face or remote sessions. Such a leading role on the development of this infrastructure will greatly increase ELIXIR's visibility across many domains of life sciences and even beyond. The coordinated effort to develop this infrastructure is similar to previous efforts carried out in ELIXIR, such as the Beacon Project and Bioschemas and will also link into work taking place in the ELIXIR Compute and Interoperability Platforms in coordination with the GA4GH.

EMBL-EBI, ELIXIR Germany, ELIXIR Spain, ELIXIR Belgium, ELIXIR France, ELIXIR Denmark, ELIXIR Italy
ELIXIR Spain, ELIXIR Switzerland
ELIXIR Netherlands
ELIXIR Italy

This project will be led by the ELIXIR Proteomics Community in collaboration with members of the Metabolomics Community and three ELIXIR platforms. High-throughput proteomics has become a popular choice in biological, biomedical and clinical studies and led to the development of hundreds of bioinformatics tools and data analysis pipelines. Given their large diversity, there is a urgent need to compare and benchmark different software pipelines over a large data spectrum.

This study aims to create the framework to benchmark proteomics data analysis workflows, to be built upon and improve resources from ELIXIR Tool, Data and Compute platforms by creating an interface between them linked with public proteomics data and open source stand-alone software and pipelines.

The involved data will be annotated with at least EOSC minimum information according to ELIXIR metadata standards. Our benchmarking will identify robust workflows and therefore nurture the proteomics community with high quality standards required for reproducible research and clinical applications.

ELIXIR Denmark, EMBL-EBI, ELIXIR Netherlands, ELIXIR Spain, ELIXIR France, ELIXIR Sweden, ELIXIR Italy, ELIXIR Czech Republic, ELIXIR Germany
ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Netherlands, ELIXIR Spain, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI

The aim of this Implementation Study is to determine the requirements for validation with ELIXIR partners, to build prototype open validation services for archetype archival databases and knowledge bases, in particular:

  • Content validation according to minimum information checklists.
  • Syntactic format validation according to a standard format in conjunction with the GA4GH file formats team as part of the Large Scale Genomics Workstream.
  • Syntactic format validation for Phenotyping data.
  • Semantic validation according to a publicly available ontology.
ELIXIR Belgium, ELIXIR France, EMBL-EBI, ELIXIR UK

The study will convene and establish a consensus on high-level community-driven standards:

  • Workflow / Task Orchestration Service: A minimal API specification that will support heterogenous containerised workflow (e.g. CWL, Galaxy, Nextflow, etc.) workloads for secured execution across the ELIXIR Federation of compute/cloud sites.
  • Tool / Workflow Registry Service: A minimal API specification that will provide access to curated heterogeneous container formats (e.g. Docker, Singularity) and workflow specifications (e.g. CWL, Galaxy, Nextflow, etc.) to be used as part of the workflow orchestration service.
  • Data Repository Service: A minimal API specification that will support the discovery and secured access to ELIXIR Core Data Resources and ELIXIR Node provided datasets as part of the workflow orchestration service.
  • Data and Workflow Security Protocols: Embedding security relying on ELIXIR AAI across all ELIXIR APIs to ensure secure access to data, tools and workflows to allow analysis to be performed on sensitive data.

This will be achieved by coordinating the expertise in the ELIXIR Platforms (Compute & Tools) and work taking place within the Nodes and related projects (e.g. EOSC-Life, EOSC-Hub), and will be broken down into three work packages: 

  1. Leveraging EOSC-Life workflows infrastructure
  2. ELIXIR Infrastructure for Orchestrating Containers and Workflows
  3. Coordinating ELIXIR Data Discovery and Transfer Services

and a number of Community Lead Use Cases: 

  1. Human data requiring security protocols provided by AAI, and data transfer services
  2. Single cell transcriptomics workflow that could be adapted for different organisms
  3. Metabolomics workflow adopted by the PhenoMeNal project
  4. Proteomics workflow aligning with the IS around benchmarking workflows to ensure reproducible research and potential clinical applications
EMBL-EBI, ELIXIR Spain, ELIXIR Germany, ELIXIR Finland, ELIXIR France, ELIXIR Denmark, ELIXIR Belgium, ELIXIR Sweden, ELIXIR Italy, ELIXIR Switzerland, ELIXIR UK, ELIXIR Netherlands, ELIXIR Greece
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Italy
ELIXIR France, ELIXIR Netherlands
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Spain, ELIXIR UK

ELIXIR is about integration of diverse resources including tools, training materials and technical services. Within EXCELERATE, ELIXIR is building portals to collate information on tools and data services (bio.tools), training events and material (TeSS, WP11 e-learning environment), compute resources (WP4 technical service registry) and cross-linked policy, standards and databases (FAIRsharing, WP4). A focus of EXCELERATE is to set up these portals such that they can interoperate.

Currently, a scientist can use TeSS to find training events and materials and then, in a separate search, use bio.tools to find relevant tools, and FAIRsharing to find standards and databases. At the moment these ELIXIR portals provide a useful, but fragmented service.  Ideally, linking TeSS and bio.tools to ELIXIR’s computer resources via common workflow diagrams would enable end-users to discover and learn about the prevalent bioinformatics workflows. In this implementation study, we want to achieve the first step and link TeSS and bio.tools via most prevalent bioinformatics workflows and lay the foundation to later incorporate other ELIXIR platforms, such as the compute resources, to provide an even more useful service for the researcher.

The goal of this implementation study is to provide the life-scientist end-user with a powerful tool to find and use ELIXIR resources - across the spectrum - based on intuitive graphical diagrams of the most prevalent scientific workflows.

ELIXIR UK, ELIXIR Estonia, ELIXIR Belgium, ELIXIR Denmark, ELIXIR Switzerland, EMBL-EBI, ELIXIR Norway, ELIXIR France
ELIXIR Denmark, ELIXIR France

The Tools Platform provides services that enable findability, accessibility, interoperability, and reusability of computational tools. These tools include research software, workflows, remote digital services, and trained machine-learning (ML) models.

During the 2024-26 programme, the ELIXIR Tools Platform aims to:

  • Allow researchers (end-users) to find well-described and fit-for-purpose research software, providing alternatives when available, that can be used in heterogeneous computational setups from a laptop to a cloud installation to an HPC centre;
  • Serve as reference point for individual research software engineers on how to develop better, more sustainable, and recognizable software by providing best practices and services supporting such an endeavour;
  • Provide scientific communities with a reference place for tackling common challenges in their own best practices for software development.

The Platform will work in five Work Packages (WPs):

Lead partners: University of Freiburg (Germany), BSC (Spain), Institut Pasteur (France)

This Work Package provides the overall project management and coordination of the ELIXIR Tools platform. It will also ensure that the Platform work plan is aligned with ELIXIR’s strategic vision, and the overall development of European research infrastructures, data spaces and regulation.

The Work Package facilitates active dialog with life-science research communities in order to understand the real needs of researchers. This is done especially by collaborating with ELIXIR Communities that represent researchers’ use cases.

Besides technology development, delivering impact on research requires that the technology can be sustained and provided for researchers. This Work Package seeks ways how the technology can be sustained as part of operational services. This can be done both by integrating technology into existing services and by establishing new services when applicable.

The landscape of European research infrastructures, data spaces and related initiatives, such as EOSC, EuroHPC and GAIA-X, are evolving and the work package is actively participating in this development.

The work package will also bridge the platform activities to GA4GH and participate in the steering of GA4GH by utilising the ELIXIR and GA4GH strategic partnership.

Activity 1: Project management and coordination

  • Set up and facilitate regular platform wide meetings, both online and face-to-face.
  • Provide proper communication channels for the work.
  • Monitor and report the progress.

Activity 2: Sustainability and dissemination

  • Ensure sufficient sustainability planning for platform development activities.
  • Ensure bi-directional communication about the platform activities towards research communities, especially through ELIXIR Communities, ELIXIR Training Platform and utilising existing communications channels such as ELIXIR webinars.
  • Knowledge sharing with other ELIXIR platforms and participation in joint-platform activities.
  • Represent platform in EOSC, EuroHPC and the ELIXIR::GA4GH strategic partnership.
  • Dissemination for Tools Platform.

Lead partners: Institut Pasteur (France), Leiden University (Netherlands), University of Southern Denmark (Denmark), Masaryk University (Czech Republic)

The ELIXIR Research Software Ecosystem (RSEc) collates and publishes informative descriptions of more than 30,000 software tools and services. Most of these records are provided and curated using the bio.tools registry, which is synchronised with the backend of the Ecosystem.

WP2 will further develop the RSEc as a way to enable federated access and curation of reference metadata of any research software. This activity will leverage the technical and scientific expertise of the various communities and services within and beyond ELIXIR (e.g. Galaxy and WorkflowHub, both WP5).

This development will target direct contribution mechanisms for software and workflow developers, so that metadata provided during the software publication and release process can be seamlessly consumed, and synchronised with the Ecosystem.

Additional mechanisms will enable full coverage of research software metadata directly from the source-code repositories, as well as contributing these from the Ecosystem, through bidirectional synchronisation. These mechanisms will use and drive the evolution of standards such as biotools Schema and EDAM (WP3).

The information available through the Tools Ecosystem will become a reference resource for (federated) reproducible analytics of the software lifecycle in research workflows and computational scientific services (involving WP5), thereby informing the formation of FAIR software stewardship practices more widely (WP3).

WP2 will strive to contribute to semi-automated curation of information related to tools, and linked to scientific literature. These efforts will build upon - and extend - the text mining-based tools developed for automated curation of the bio.tools registry.

Activity 1: Designing a software metadata editor for the ecosystem

The registration and editing of software metadata in different parts of the ecosystem (e.g. bio.tools) relies on separate user interface components. One of the major components is the entry edition component of bio.tools.

The quality and features of this component has a major impact on the adoption of bio.tools, and by extension, the research software ecosystem. We will leverage the evolutions of the ecosystem to facilitate the edition of software metadata, by designing a new software metadata editor.

Activity 2: Defining the next generation of the bio.tools search UI and backend

The search function in the bio.tools registry is its most important functionality, as it usually represents the first access point to bio.tools, and even the ecosystem itself. As such, it is an essential part for the findability of the referenced software.

The current solution, although functional, has some limitations. For instance, it does not exploit the reasoning capabilities provided by EDAM (e.g. taking into account the class hierarchy of EDAM concepts in the search). Also, the additional metadata provided by other resources through the ecosystem are not incorporated in the current interface.

Our goal is to design the next version of the search interface of bio.tools. There are therefore two components to include, the backend capabilities of the search engine itself, and the user interface which has to provide an access to these capabilities.

Activity 3: Outreach

Our outreach efforts will be focused on building partnerships with organisations such as EOSC and RDA, as well as academic publishers and literature resources. We will develop, in collaboration with these organisations as well as ELIXIR communities, guidelines that promote good practices in publishing software metadata.

The provided guidelines and templates will help research software developers provide the required metadata to accompany published research, to ensure that all relevant information is captured and made available to the scientific community.

Activity 4: Expanding the Ecosystem

We recognise the importance of aligning our efforts with other initiatives in the scientific community, such as CodeMeta. To further support the advancement of science and facilitate collaboration, we are expanding our efforts beyond traditional scientific software to include machine learning models. Inclusion of reusable ML models and connecting to current efforts will improve the relevance, “market” value, and synergies of the Research Software Ecosystem.

Given the current momentum and growing interest in the field of machine learning, we will explore the options for enabling the Ecosystem users to find information about reusable ML models - as reusable tools - from ML model registries and platforms (e.g. Hugging Face or BioImage.io).

Lead partners: University of Bergen (Norway), BSC (Spain)

This Work Package will be responsible for guidelines, standards, and knowledge sharing dedicated to making research software FAIR.

It will promote principles of software management and best practices, deliver up-to-date standards for describing software and other research objects, and disseminate these in the form of software stewardship. The required standards include EDAM, data models and information standards for recording information ("metadata") about research software, and templates for creating software management plans (SMPs).

EDAM is widely used by most ELIXIR Platforms and Communities, as one of the common knowledge management standards. It serves findability, provenance, and interoperability. WP3 will contribute to keeping EDAM up to date, and coordinate EDAM extensions by ELIXIR Communities, Nodes, and affiliated projects, covering their needs.

SMPs play the same role as DMPs, but for software. The ELIXIR SMP is low-barrier, tailored for life scientists, and fully aligned with the FAIR Research Software principles.

A software management portal will be designed linking knowledge, training materials, and services through a single entry point. It will provide quality metrics including FAIRness assessment (connected to WP4 and the Training Platform) and the utility of machine-actionable SMPs will be investigated.

WP3 will also work together with other WPs, Platforms, and global initiatives, on delivering data models and information standards for describing software-based resources, and enhancing citations. This WP will engage with international organisations dealing with research software, such as EOSC, RDA, NIH, and Australian BioCommons. We will also work towards expanding the software best practices to further domains (such as AI/ML).

Activity 1: Best practices in research software

Building on the success of the Software Best Practices group, we will expand the effort to capture best practices across relevant efforts, such as Machine Learning (aligning also to the RDA FAIR4ML group) and web applications.

The first step will be to perform a community review of the specific domain, and will consequently reprise the approach taken for the software best practices, i.e. defining the minimal set/low barrier actions (such as the 4OSS), review the connection the FAIR principles (similar to FAIR4RS), ultimately leading to a structure format to capture relevant information (such as the SMP).

Activity 2: Machine actionability in software

Machine actionability makes interoperability easier for machines, while also enabling metadata-based links across tools within the Tools Ecosystem and relevant efforts in other domains.

In the case of Software Management Plans, it makes it easier for different stakeholders to review and assess the information captured through the SMPs. We will refine / tune the current metadata schemas that were produced from the joint RDA/EOSC-Future / ZBMed / ELIXIR project on machine-actionable SMP, while also establishing better connections to the Bioschemas community.

Based on this effort, we will integrate the metadata schema into the Software Management Wizard, ultimately transforming questionnaire-based information captured in the wizard into maSMPs (machine-actionable) SMPs.

Finally, we will review engagement strategies to connect and align to other relevant national (e.g. NFDI4DataScience, eScience Center Netherlands) and international (e.g. RDA, SWH) initiatives.

Activity 3: Quality indicators for software

Using the existing efforts around research software quality (such as EOSC, NFDI, etc), this task will aim to map and align the various indicators to the existing and/or planned services (e.g. OpenEBench), tools (e.g. Software Observatory, FAIR-Checker) and frameworks (e.g. SMPs, Bioschemas) that exist within the Tools Platform ecosystem.

Activity 4: Training and capacity building for software

We are continuing to expand the TeSS collection of courses and materials around Best Practices (currently including the 4OSS and the new SMP) with existing relevant resources from within and beyond ELIXIR (such as Carpentries, GOBLET, GTN, etc), and in a co-production model with the Peoples’ Tier CoS.

Additionally, new training material will be created to capture the research software quality aspects, connecting the respective Tools Platform and the Training Platform activities. Moreover, and in close collaboration with the Learning Paths Focus Group, we will co-develop and design dedicated learning pathways around research software.

Activity 5: Application of EDAM ontology to ELIXIR Communities and diverse ELIXIR services

Technical and community capacity building for the EDAM ontology is required for keeping EDAM up to date with the evolution and challenges of biosciences and global issues.

Our focus is on enabling ELIXIR Communities to expand the ontology, and ensure that it encapsulates their needs. This will enable scientists to more easily find and use existing resources.

We support the development of robust and user-friendly platforms that empower researchers leverage the wealth of resources and knowledge available in the scientific community, now also extending beyond life sciences.

This would also entail embedding EDAM in as many step-by-step guides and training workshops as possible, and making it interoperable with other main ontologies and linked open data.

We will aim at refining EDAM for improved performance of AI applications, such as the text mining to annotate and add to bio.tools, or to enrich scientific articles, assisted workflow generation and provenance.

Lead partners: University of Bologna (Italy), IRISA/GenOuest (France)

In the life sciences, research software is used in a distributed manner across computational infrastructures. Distributed use includes software used in a unique place, to tools being used across different installations, to federated learning. Despite the increase in complexity, these approaches share many underlying considerations. This WP will focus on three of them:

  1. Benchmarking efforts driven by research communities.
  2. The distribution and deployment of containerised software across heterogeneous systems.
  3. The reproducibility and replicability of research outcomes.

Activity 1: Defining standards for benchmarking datasets across emerging communities

While benchmarking is important to ensure that different computational approaches can be compared and evaluated, we have observed that only a few people are currently working on this. In particular, few people are defining datasets (synthetic or otherwise) that can be used in benchmarking.

To address this issue, we plan to leverage the ELIXIR communities to identify individuals who are passionate about benchmarking, and have them champion these efforts. By establishing a community of practice focused on benchmarking, we aim to support the development of standards and practices that can be adopted across the scientific community.

Activity 2: Facilitating benchmarking through containerisation

Easy and reproducible deployments are a necessity for modern data science. It gets even more important when distributed analysis are becoming more important and broadly used by researchers. Containers, and especially the Bioconda based BioContainers have proven to be a reliable source to tackle the deployment issues across clouds, HPCs and local deployments.

BioContainers are used by Nextflow, Snakemake and Galaxy, and are therefore part of state-of-the-art workflows. In this WP, we will continue our Container efforts, adapt to new architectures (ARM, RISC-5) and support the workflow community to create containers as they are needed. This includes a special set of containers for benchmarking use-cases selected by the involved communities (WP4.1).

Lead partners: University of Manchester (UK), VIB (Belgium)

To address the increasing complexity of life science analytics, ELIXIR communities need access to large scale compute capacities, often in an international context. They need to use sustainable compute resources, with the appropriate regulatory compliance, accounting and provenance.

While aspects of this have been addressed by the e-infrastructures in Europe, life scientists have additional demands for computing sensitive data. We will identify best practices in our communities, Nodes and projects in reporting resources (facilities, services) used to lay the groundwork for billing or other financial accounting of services.

The Platform will identify automated, lightweight solutions to track provenance. In turn this will improve the reproducibility of scientific analysis and the recognition of relevant service providers as well as researchers at the individual and organisational level.

Activity 1: Best practices, upscaling to build FAIR workflows using WorkflowHub in Communities

We will promote best practices for workflows, with a particular emphasis on WorkflowHub, leveraged by Communities. This will be facilitated via a white-paper to Communities, to kickstart their contributions to WorkflowHub and build exemplar datasets. Our efforts will follow best practices for workflows documentation (including EDAM ontology and crosslinks) and outreach.

An important aspect is the seamless invocation of the workflows between WorkflowHub and Galaxy, to upscale our users. This will build on initial work by the Workflows Community Initiative’s working group for FAIR Computational Workflows, which are gathering best practices across workflow systems, as well as EuroScienceGateway’s approach to workflow citation in scholarly communication. APIs, like the GA4GH TRS endpoint, will be used for more WorkflowHub integration across the Tools Ecosystem.

Activity 2: Environmental impact of Galaxy and bioinformatics

We recognise that our operations have an impact on the environment, and we are committed to reducing this impact using Galaxy, its associated ecosystem and its users, as a prime use case. We will also consider the relationships with RO-Crate and relevant schemas across ELIXIR to facilitate the work.

Our first step will be to conduct a comprehensive assessment of our operations and to identify areas where we can reduce our environmental footprint. We will then communicate our findings and progress to the wider scientific community, fostering a culture of transparency and accountability around environmental impact.

We will help raise awareness of the environmental impact of research, and we will develop and implement a range of mitigation strategies. Data collected by Galaxy and optimisations done will be integrated back into the Research Software Ecosystem, e.g. OpenEBench and coordinated with the Compute Platform (WP5). All our activities will be linked with the ELIXIR Focus Group for environmental impact.

Activity 3: Flexible upscaling of FAIR workflows

Upscaling of workflows is important for handling large volumes of scientific data (e.g. using GPUs and HPC), but can act as a restriction on FAIR principles. It can limit the number of computing infrastructures you can run them on and thus reduce their reusability. However these requirements are not currently well described when defining the workflow and sharing it in repositories like WorkflowHub.

This task will take advantage of ongoing work on optimising upscaling of workflows, in particular EuroScienceGateway and the Pulsar network for federated compute. We will reflect on changes needed for ELIXIR Cloud to fully support scalable workflows while retaining “just enough” FAIR aspects.

Implementations will enable federated upscaled computation of workflows, taking into consideration transport time (workflow sent to compute nearest data) and environmental impact (WP5.2).

ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR Denmark, EMBL-EBI, ELIXIR Estonia, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK
ELIXIR Germany
ELIXIR Germany
ELIXIR Belgium, ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Spain, ELIXIR Sweden, ELIXIR UK
ELIXIR Denmark
ELIXIR Slovenia

The Marine Metagenomics Community has adopted the use of the Common Workflow Language (CWL) as an interoperable way to describe their analysis pipelines. One of the most complex and fully developed CWL workflows implements the EBI metagenomics analysis pipeline.

In coordination with MG-RAST, a US based metagenomics analysis pipeline, there are now two different large-scale metagenomics CWL workflows. Each uses a different CWL execution framework (namely Toil and AWE) and are run on different compute infrastructures. During the course of the coming year, the Marine Use Case expects META-pipe (the ELIXIR-NO, marine specific metagenomics pipeline) and other metagenomics related tools (e.g. ITS1 analysis from ELIXIR-IT) to adopt CWL. These additional tools can be used as alternatives for pre­existing tools or extend the functionality of the current workflows.

This Implementation Study aims to:

  1. demonstrate the benefits of using CWL by combining different workflows components to make new workflows;
  2. extend the current CWL workflows to enable greater reuse;
  3. enhance the execution frameworks to improve both deployment and scalability;
  4. deploy a single CWL workflow on different ELIXIR cloud environments to enable parallel processing and reproducibility.

To provide an exemplar to both the ELIXIR and the broader scientific communities, we will work through a community case study and ensure that the data, analysis and results conform to a bona fide Research Object (RO), ensuring that they comply with FAIR principles. We will develop appropriate training materials for two key target audiences - producers of (workflows and ROs) and consumers.

This study is closely linked with the work of the Bioschemas Community.

ELIXIR France, ELIXIR UK, EMBL-EBI, ELIXIR Finland
ELIXIR Belgium, ELIXIR Germany, ELIXIR Norway

An ELIXIR implementation study started in February 2017, as a collaboration between EMBL-EBI and ELIXIR-DE. Its main objective is to develop open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database (an ELIXIR core data resource) and to deploy these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Building on this work, we here propose a follow-up project that has three objectives: 

  1. The inclusion of additional open tools developed by other ELIXIR nodes
  2. The improvement of the overall infrastructure supporting the implementation of proteomics data analysis pipelines
  3. The inclusion of quality control pipelines.

The overarching goal is that these tools can be deployed in other cloud infrastructures, and can be easily reused by anyone in the community, thus bringing the users closer to the tools, and the tools closer to the data.

Impact of the study

The outcome will be that an increased range of open proteomics tools will be included in an extended range of cloud infrastructures, including new quality control features based on OpenMS. Impact – increased facility for proteomics analysis across multiple cloud platforms – all with increased degree of quality control.

ELIXIR Belgium, EMBL-EBI, ELIXIR Germany, ELIXIR France, ELIXIR Spain
ELIXIR Germany
ELIXIR Italy
ELIXIR France

The implementation study project plan of ELIXIR Italy consists of six activities that aim to boost the cooperation with existing ELIXIR activities and are expected to deepen the interaction between ELIXIR-IIB, the Joint Research Unit embodying the Italian Node, and ELIXIR. The partners involved have already established contacts with other ELIXIR Nodes and the relevant ELIXIR Platforms and Services in order to ensure an advantageous outcome for all the involved parties. The goal of the proposed activities is to create and/or reinforce collaborations based on concrete measures. With this implementation study the Italian ELIXIR Node will achieve greater integration within ELIXIR service infrastructures and data interoperability policies. The topics of the selected activities and an additional coordination task are summarized below:

  1. Integration in ELIXIR Bioschemas activities.
  2. Integration in ELIXIR Data Curation activities.
  3. Integration in ELIXIR Galaxy activities through a project on practical feasibility of creating and running large-scale Galaxy-based variant calling pipelines on microservice infrastructures.
  4. Integration in ELIXIR Human Data activities through Beacons.
  5. Integration in ELIXIR Marine Metagenomics activities through a web-service supporting ITS1-based survey of marine communities.
  6. Integration in ELIXIR Rare Diseases activities.
  7. Coordination of the Italian ELIXIR Node Implementation study project.
ELIXIR Italy

The implementation study project plan of ELIXIR Italy consists of six activities that aim to boost the cooperation with existing ELIXIR activities and are expected to deepen the interaction between ELIXIR-IIB, the Joint Research Unit embodying the Italian Node, and ELIXIR. The partners involved have already established contacts with other ELIXIR Nodes and the relevant ELIXIR Platforms and Services in order to ensure an advantageous outcome for all the involved parties. The goal of the proposed activities is to create and/or reinforce collaborations based on concrete measures. With this implementation study the Italian ELIXIR Node will achieve greater integration within ELIXIR service infrastructures and data interoperability policies. The topics of the selected activities and an additional coordination task are summarized below:

  1. Integration in ELIXIR Bioschemas activities.
  2. Integration in ELIXIR Data Curation activities.
  3. Integration in ELIXIR Galaxy activities through a project on practical feasibility of creating and running large-scale Galaxy-based variant calling pipelines on microservice infrastructures.
  4. Integration in ELIXIR Human Data activities through Beacons.
  5. Integration in ELIXIR Marine Metagenomics activities through a web-service supporting ITS1-based survey of marine communities.
  6. Integration in ELIXIR Rare Diseases activities.
  7. Coordination of the Italian ELIXIR Node Implementation study project.
ELIXIR Italy

The goal is this Staff Exchange project is to accelerate the ongoing work of integrating various components of the new Tools Platform Ecosystem, as defined in the Tools Platform Task 1 of the ELIXIR 2019-23 Programme, Work Package 2 (Developing an integrated “Tools Platform ecosystem”).

As a centralised, transparent repository of information about tools and services, the new Tools Platform Ecosystem will serve as the foundation for sustainability of the diverse Tools Platform services, and for interoperability between both the essential Platform services (bio.tools, BioContainers, OpenEBench, Bioconda) and related services outside of the Tools Platform (e.g. myExperiment, IFB Catalogue, Debian Med, Galaxy, or the resources of the bioimage analysis community).

The concrete focus of the proposed Exchange project is to integrate bio.tools into the new Tools Platform Ecosystem. In the beginning, the project will provide the implementation and test production experience needed for the deliverable D2.1 (“Governance and process management agreement for the new tools ecosystem”; due 2020 Q1) of the mentioned Tools Platform Task 1, and within its duration pave the way for D2.2 Ibid. (“Guidelines and procedures for the prospective inclusion of new registries and data repositories into the ecosystem”; due 2020 Q3).

This Staff Exchange project will foster workload sharing and organisation between the respective ELIXIR Nodes, and provide means for the necessary technological knowledge-exchange among the personnel of the participating Nodes.

ELIXIR Norway, ELIXIR Denmark, ELIXIR France, ELIXIR Germany

The aim of this new strategic implementation study is to build on the current progress made through the on-going implementation study to enable adoption and deployment of protocols and services by the broader ELIXIR community at scale. This Strategic Implementation Study (SIS) aims to coordinate existing efforts across ELIXIR, identify opportunities, contribute in a targeted and limited way with specific developments to connect relevant components and propose mechanisms for sustaining this effort over time.

ELIXIR Belgium, ELIXIR Switzerland, ELIXIR Czech Republic, ELIXIR Germany, EMBL-EBI, ELIXIR Spain, ELIXIR Italy, ELIXIR Finland, ELIXIR France, ELIXIR Greece, ELIXIR Netherlands, ELIXIR UK

Metabolomics aims to provide novel insights into the biochemical reactions of organisms by characterising the presence and concentrations of low molecular weight compounds from biological samples. The primary analytical tools for such high-throughput data collection are mass spectrometry (MS), often preceded by chromatographic or electrophoretic separation technologies, and nuclear magnetic resonance spectroscopy (NMR).

These technologies produce relatively large and complex data sets that require bioinformaticians, cheminformaticians, biostatisticians, data scientists and computer scientists. Together they develop and apply a wide range of algorithms, software tools, repositories and computational resources to process, analyse, report and store the data and metadata.

Increasingly, insights from genomics, epigenomics, transcriptomics, proteomics/protein interactomics and metabolomics are combined, to gain insights into the dynamics of biological processes. Metabolomics activities are well represented within Europe and ELIXIR nodes. Metabolite identification is the area that the community believes will have maximal impact of computational metabolomics and metabolomics data management and will benefit most from interactions with the existing five ELIXIR platforms and where progress will contribute most to other ELIXIR communities.

The progress through this integrative Implementation Study will benefit industry and academia alike as metabolite identification is one of the major bottlenecks in metabolomics and resolving this challenge requires a community effort.

ELIXIR Netherlands, EMBL-EBI, ELIXIR France, ELIXIR UK, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, ELIXIR Italy, ELIXIR Estonia, ELIXIR Switzerland, ELIXIR Belgium

Biological communities work across a range of domains and use a variety metrics to measure and evaluate their activities and performance. Metrics can be used in training, project management, software development as well as in more specific tasks like benchmarking of bioinformatics analysis tools or evaluation of data resources in the life science.

Even if it is possible to find data about specific metrics in publications or web resources, it is hard to find a proper description, methodology and implementation of such metrics. This makes it difficult to understand the specific meaning of a metric and makes it hard to compare metrics provided by different resources.

This project focused on proposing a solution to help to define, discover and access metrics and metrics implementations. We aimed to provide a prototype framework to help registering defined metrics and test run metrics implementations as a proof of concept. The aim being for users to be able to improve the evaluation of a resource, to assess the impact of a variety of services and gain a contribution into decision making. 

This study is now finished, the work is summarised in the end report and in two publications: 

An outline of the PIsCO framework is available on GitHub. The study also identified a relationship with Bioschemas and is looking to redefine the metrics database accordingly. 

ELIXIR UK
ELIXIR Belgium, ELIXIR Germany

Software containers are a key element in the frame of Open Science, Open Data & Open Source which is strongly supported and advocated by ELIXIR. Software containers are key to guarantee data provenance when described as part of scientific workflows and an important element towards results reproducibility.

They also ease software installation on local computer or cluster facilities.Thus, software containers are transversal to most of the strategic lines of the ELIXIR Tools Platform for the 2022 - 2023 Scientific programme.
We have divided this task into two work packages around software containers that compliment each other.

The first work package aims to maintain and extend the work initiated in the previous ELIXIR implementation study on BioContainers. The implementation study on BioContainers contributed to unify various initiatives in ELIXIR Nodes around software containers and bring them under a common infrastructure and metadata federation. This first work package will focus on operational maintenance of the infrastructure. Additional registry mirrors will be investigated for cloud/local registry availability. To do so, a partnership with Amazon (and potentially other commercial cloud providers) will be set.

While x86 architectures dominate the scientific compute clusters and clouds at this time (currently the container architecture supported by BioContainers), ARM architecture is a mature technology getting traction in both server and consumer markets. With the first super-computers starting to build on ARM architectures, the BioContainers project needs to be prepared to offer ARM based container solutions to our users. Therefore, a task of this work package will evaluate the multi-architecture container solution and add support for ARM in addition to x86. As this task requires extra physical resources, we will try to get support from the ARM company and/or support from ELIXIR members to build ARM-based BioContainers.

To compile software against different architectures, upstream support from tool developers will most likely be needed. Here, we will work together with the “development best practices” (Task 4) team to encourage developers to provide multi-arch support to their software and help them along the way.

BioContainers already integrates its metadata with the central repository of the Tools Platform, work will continue to align/homogenise with the repository evolutions which were, in a first step, a raw addition of metadata from all Tools Platform services (BioContainers, OpenEBench, bio.tools, ...) and the Tools Ecosystem.

Second work packages will focus on user communities. While BioContainers is widely used by the ELIXIR community, it could reach/extend to other communities around life science. Discussions with EOSC and other communities will evaluate how BioContainers could fit to their needs, and the possibility to contribute to the BioContainers project and the registry. This would provide end users a single entry-point and solution for container management and optimize human/compute resources.

As a summary, BioContainers is today well established in our community and provides a stable infrastructure for container availability and findability. Existing solutions could benefit extra life science communities. Energy efficiency trends towards ARM architecture (and possibly others in the future) should not be ignored and is a chance for BioContainers to provide its support to an emerging but growing community.

ELIXIR Belgium, ELIXIR Germany, EMBL-EBI, ELIXIR Spain, ELIXIR France, ELIXIR Greece, ELIXIR Italy, ELIXIR Norway, ELIXIR UK

This study will build on recent work with  Software containers which, as a key element in the frame of Open Science, Open Data & Open Source, is strongly supported and advocated by ELIXIR. Software containers are transversal to most of the strategic lines of the ELIXIR Tools platform for the 2019 - 2023 Scientific programme.

There will be three Work Packages: 

1. To maintain and extend the work initiated in the 2018 ELIXIR implementation study on Biocontainers. The implementation study on Biocontainers contributed to unify various initiatives in ELIXIR nodes around software containers and bring them under a common infrastructure which will be consolidated by incorporating new technologies for software containerisation and explore the federation of the platform to facilitate its sustainability in the long-term.

2. To implement the evolution of the ELIXIR tools platform ecosystem to create a central repository providing metadata rich, technology agnostic software containers for its use and deployment across sites and platforms. Initially this will integrate content from bio.tools, Biocontainers, OpenEBench and Galaxy, and in time facilitate the inclusion of new data and metadata producers (e.g. bioconda, bioconductor, etc) and/or new data and metadata consumers (e.g. GA4GH TRS, MyExperiment, etc). 

3. Engaging with existing and newly created community of users (within ELIXIR and without) who are of the utmost importance to guarantee that whatever standard and/or technology responds to users needs. Software containers will play an important role here to ensure users can benefit from the ongoing efforts in the evolved tools platform ecosystem and with other ELIXIR platforms such as Training, Interoperability and/or Compute.

This study will provide containerised tools and state-of-the-art benchmarked workflows available in Galaxy for scientific communities. For long-term sustainability and impact, we will ensure that all workflows and tools are curated to a high standard, rendered FAIR, and follow agreed standards within ELIXIR and by initiatives like GA4GH and EOSC.

ELIXIR Belgium, ELIXIR France, ELIXIR Italy, ELIXIR Norway, ELIXIR Spain, EMBL-EBI, ELIXIR Germany, ELIXIR Spain, ELIXIR Denmark

This study will build on recent work with  Software containers which, as a key element in the frame of Open Science, Open Data & Open Source, is strongly supported and advocated by ELIXIR. Software containers are transversal to most of the strategic lines of the ELIXIR Tools platform for the 2019 - 2023 Scientific programme.

There will be three Work Packages: 

1. To maintain and extend the work initiated in the 2018 ELIXIR implementation study on Biocontainers. The implementation study on Biocontainers contributed to unify various initiatives in ELIXIR nodes around software containers and bring them under a common infrastructure which will be consolidated by incorporating new technologies for software containerisation and explore the federation of the platform to facilitate its sustainability in the long-term.

2. To implement the evolution of the ELIXIR tools platform ecosystem to create a central repository providing metadata rich, technology agnostic software containers for its use and deployment across sites and platforms. Initially this will integrate content from bio.tools, Biocontainers, OpenEBench and Galaxy, and in time facilitate the inclusion of new data and metadata producers (e.g. bioconda, bioconductor, etc) and/or new data and metadata consumers (e.g. GA4GH TRS, MyExperiment, etc). 

3. Engaging with existing and newly created community of users (within ELIXIR and without) who are of the utmost importance to guarantee that whatever standard and/or technology responds to users needs. Software containers will play an important role here to ensure users can benefit from the ongoing efforts in the evolved tools platform ecosystem and with other ELIXIR platforms such as Training, Interoperability and/or Compute.

This study will provide containerised tools and state-of-the-art benchmarked workflows available in Galaxy for scientific communities. For long-term sustainability and impact, we will ensure that all workflows and tools are curated to a high standard, rendered FAIR, and follow agreed standards within ELIXIR and by initiatives like GA4GH and EOSC.

ELIXIR Belgium, ELIXIR France, ELIXIR Italy, ELIXIR Norway, ELIXIR Spain, EMBL-EBI, ELIXIR Germany, ELIXIR Spain, ELIXIR Denmark

OpenEBench is designed to support benchmarking activities in terms of 1) scientific performance of individual tools, workflows and platforms in the context of self-organized communities, and 2)
technical monitoring of the life sciences research software using dedicated metrics and indicators.
Indeed, one of the main goals of OpenEBench is to become an observatory of bioinformatics software quality regarding openness. OpenEBench is strongly committed to promote the adoption of all guidelines and technologies promoted by ELIXIR including but not limited to ELIXIR AAI, software development and containerization best-practices, FAIR data principles for reference datasets as well as the principles on Open Data, Open Source and Open Science.

ELIXIR Spain, ELIXIR Norway, ELIXIR Italy, ELIXIR Greece, ELIXIR Czech Republic, ELIXIR Switzerland

This study will support OpenEBench in the goal to become an observatory of bioinformatics software quality regarding openness including: 

  • ELIXIR AAI
  • software development and containerization best-practices
  • FAIR data principles for reference datasets as well as the principles on Open Data, Open Source and Open Science.

WP1. A Core Benchmarking Service: 

Work Package 1 has the following aims:

  • To establish, consolidate and extend the core ELIXIR Service for benchmarking
  • To lower the benchmarking startup hurdle
  • To provide basic tests of tool operability
  • To alleviate reimplementation of abstractable workflows

Through four subtasks: 

  1. Execution environment for benchmarking workflows
  2. Automated data import
  3. Scientific benchmarking test case
  4. Extend visualization gallery
ELIXIR Czech Republic, ELIXIR Italy, ELIXIR Norway, ELIXIR Spain, ELIXIR Switzerland

bio.tools provides persistent identifiers for 10,000+ verified tool descriptions and has attracted 20 new contributors / month over the last year. Through this Study we will: 

  1. Engage scientific expertise including ELIXIR Communities and ELIXIR Nodes to improve the scientific quality and national representation.
  2. Develop tooling for domain-specific bio.tools views for communities

Through a combination of

  • best-practice guidelines,
  • extension of the bio.tools studentship scheme,
  • community-led workshops,
  • adherence to the bio.tools Tool Information Standards.

Four subtasks will establish a network of thematic editors for scientific areas and nations, to oversee improvement in the scientific quality of EDAM and bio.tools, and ensure national interests are adequately represented.  This will represent existing and emerging ELIXIR communities, e.g. Proteomics, Metabolomics and Marine Metagenomics.

ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR France, ELIXIR Norway

To raise the quality and sustainability of research software

This study will promote the production, adopting, promoting and measuring information standards and best practices applied to software development life cycle. We have published four simple recommendations to encourage best practices in research software and the Top 10 metrics for life science software good practices. 

The next steps are to:

  • Adopt, promote, and recognise these practices, by developing guidelines to help software developers to adopt and comply with the 4OSS recommendations
  • Train and promote software best practices and the 4OSS
  • Measure, recognise and  visualise adoption (longer term aim)
  • To develop a software management plan template, connected to a concise description of the guidelines for open research software.
  • Produce a white paper for the software development management plan for ELIXIR which can be consequently used to produce training material.
  • We will work with the newly formed  ReSA (Research Software Alliance ) to facilitate adoption of this plan to the wider community.
ELIXIR Netherlands, ELIXIR Greece, ELIXIR Italy, ELIXIR Norway, ELIXIR Sweden, ELIXIR Belgium

To raise the quality and sustainability of research software

This study will promote the production, adopting, promoting and measuring information standards and best practices applied to software development life cycle. We have published four simple recommendations to encourage best practices in research software and the Top 10 metrics for life science software good practices. 

The next steps are to:

  • Adopt, promote, and recognise these practices, by developing guidelines to help software developers to adopt and comply with the 4OSS recommendations
  • Train and promote software best practices and the 4OSS
  • Measure, recognise and  visualise adoption (longer term aim)
  • To develop a software management plan template, connected to a concise description of the guidelines for open research software.
  • Produce a white paper for the software development management plan for ELIXIR which can be consequently used to produce training material.
  • We will work with the newly formed  ReSA (Research Software Alliance ) to facilitate adoption of this plan to the wider community.
ELIXIR Netherlands, ELIXIR Greece, ELIXIR Italy, ELIXIR Norway, ELIXIR Sweden, ELIXIR Belgium

The aim of this task is to raise the Quality and Sustainability of research software by producing, promoting, measuring and adopting best practices applied to the software development life cycle. So far, efforts of this group have led to the publications of the four simple recommendations to encourage best practices in research software and the Top 10 metrics for life science software good practices.

Additionally, we’ve produced training material to promote and increase the awareness of these practices, and delivered several workshops across ELIXIR Nodes, targeting researchers and developers. Moreover, after capturing the community practices towards managing research software, this group produced a first draft of a software management plan template, connected to a concise description of the guidelines for open research software.

Next steps include activities towards alignment between the best practices and the wider Tools Platform ecosystem, the implementation of tools to support the application of these practices, expanding the training portfolio of the Software Best Practices including the Software Management Plan, and ultimately create a community around the sustainability, adoption and maintenance of the Software Management Plan effort going beyond the confines of the ELIXIR network, and engaging with relevant initiatives and groups around the globe (such as the Research Software Alliance, the Australia BioCommons, Force11 and the Research Data Alliance).

ELIXIR Spain, ELIXIR Italy, ELIXIR Greece, ELIXIR Sweden, ELIXIR Netherlands, ELIXIR Germany, ELIXIR Norway

The aim of this Implementation Study was to prepare for the start of the ELIXIR-EXCELERATE project and the anticipated scaling up of the ELIXIR Tools and Data Services Registry, both in terms of the content, functionality and the community behind it, certain urgent developments are required. These include technical development of the registry, preparations for a series of hackathons and publications following from these efforts.

The study has been completed, see the end report.

Webinar summarising the outcomes

Slides

ELIXIR Denmark

The bio.tools registry content is continuously growing as new entries are added by curators and by the community. Currently contains over 20,000 tool annotations. Given the large number of bio.tools entries, the registry needs to ensure that its users can seamlessly search and obtain tools of interest.

To achieve this there is a need for high quality content along with the functionality and interface necessary to curate and maintain this content, and provide it to the users in a manner that encourages data (re) usability, contribution and integration with other projects. Both the bio.tools registry and the EDAM ontology are essential, continuously evolving components of the ELIXIR Tools Platform, that will be deeply integrated with the Tools Platform Ecosystem and other Platforms.

All the work package components described below synergize and complement each other. It is important to acknowledge that there is little value gained by having quality tool annotations without the mechanisms and interfaces to identify and use these tools. In a similar fashion there is little value to having highly performant searching and complex interfaces without quality of data. These types of issues can be solved by tackling the problem from multiple angles. In consequence the below described tasks will contribute to creating a well integrated and up-to-date system which provides both valuable high quality data and the means to deliver this data to the users in a practical and useful manner.

ELIXIR Denmark, ELIXIR Norway, ELIXIR France, ELIXIR Estonia, ELIXIR Germany, ELIXIR Spain, ELIXIR Netherlands, ELIXIR Czech Republic

The Tools Platform has initiated the development of the “Tools Platform Ecosystem”. This metadata exchange platform will coordinate the different registries and services maintained by the ELIXIR Tools Platform, using standards such as EDAM, Bioschemas, and biotoolsSchema. Building on top of the content aggregated and curated over the last years, it will open up the content and make it accessible beyond API calls. In production, this platform will serve as the central tool-metadata hub for the Tools Platform resources, and provide integration with other services and communities within and beyond ELIXIR.

ELIXIR France, ELIXIR Norway, ELIXIR Germany, ELIXIR Denmark, ELIXIR Spain, EMBL-EBI, ELIXIR UK, ELIXIR Belgium, ELIXIR Estonia, ELIXIR Italy