ELIXIR Tools Platform (2024-26)

The Tools Platform provides services that enable findability, accessibility, interoperability, and reusability of computational tools. These tools include research software, workflows, remote digital services, and trained machine-learning (ML) models.

During the 2024-26 programme, the ELIXIR Tools Platform aims to:

  • Allow researchers (end-users) to find well-described and fit-for-purpose research software, providing alternatives when available, that can be used in heterogeneous computational setups from a laptop to a cloud installation to an HPC centre;
  • Serve as reference point for individual research software engineers on how to develop better, more sustainable, and recognizable software by providing best practices and services supporting such an endeavour;
  • Provide scientific communities with a reference place for tackling common challenges in their own best practices for software development.

The Platform will work in five Work Packages (WPs):

Lead partners: University of Freiburg (Germany), BSC (Spain), Institut Pasteur (France)

This Work Package provides the overall project management and coordination of the ELIXIR Tools platform. It will also ensure that the Platform work plan is aligned with ELIXIR’s strategic vision, and the overall development of European research infrastructures, data spaces and regulation.

The Work Package facilitates active dialog with life-science research communities in order to understand the real needs of researchers. This is done especially by collaborating with ELIXIR Communities that represent researchers’ use cases.

Besides technology development, delivering impact on research requires that the technology can be sustained and provided for researchers. This Work Package seeks ways how the technology can be sustained as part of operational services. This can be done both by integrating technology into existing services and by establishing new services when applicable.

The landscape of European research infrastructures, data spaces and related initiatives, such as EOSC, EuroHPC and GAIA-X, are evolving and the work package is actively participating in this development.

The work package will also bridge the platform activities to GA4GH and participate in the steering of GA4GH by utilising the ELIXIR and GA4GH strategic partnership.

Activity 1: Project management and coordination

  • Set up and facilitate regular platform wide meetings, both online and face-to-face.
  • Provide proper communication channels for the work.
  • Monitor and report the progress.

Activity 2: Sustainability and dissemination

  • Ensure sufficient sustainability planning for platform development activities.
  • Ensure bi-directional communication about the platform activities towards research communities, especially through ELIXIR Communities, ELIXIR Training Platform and utilising existing communications channels such as ELIXIR webinars.
  • Knowledge sharing with other ELIXIR platforms and participation in joint-platform activities.
  • Represent platform in EOSC, EuroHPC and the ELIXIR::GA4GH strategic partnership.
  • Dissemination for Tools Platform.

Lead partners: Institut Pasteur (France), Leiden University (Netherlands), University of Southern Denmark (Denmark), Masaryk University (Czech Republic)

The ELIXIR Research Software Ecosystem (RSEc) collates and publishes informative descriptions of more than 30,000 software tools and services. Most of these records are provided and curated using the bio.tools registry, which is synchronised with the backend of the Ecosystem.

WP2 will further develop the RSEc as a way to enable federated access and curation of reference metadata of any research software. This activity will leverage the technical and scientific expertise of the various communities and services within and beyond ELIXIR (e.g. Galaxy and WorkflowHub, both WP5).

This development will target direct contribution mechanisms for software and workflow developers, so that metadata provided during the software publication and release process can be seamlessly consumed, and synchronised with the Ecosystem.

Additional mechanisms will enable full coverage of research software metadata directly from the source-code repositories, as well as contributing these from the Ecosystem, through bidirectional synchronisation. These mechanisms will use and drive the evolution of standards such as biotools Schema and EDAM (WP3).

The information available through the Tools Ecosystem will become a reference resource for (federated) reproducible analytics of the software lifecycle in research workflows and computational scientific services (involving WP5), thereby informing the formation of FAIR software stewardship practices more widely (WP3).

WP2 will strive to contribute to semi-automated curation of information related to tools, and linked to scientific literature. These efforts will build upon - and extend - the text mining-based tools developed for automated curation of the bio.tools registry.

Activity 1: Designing a software metadata editor for the ecosystem

The registration and editing of software metadata in different parts of the ecosystem (e.g. bio.tools) relies on separate user interface components. One of the major components is the entry edition component of bio.tools.

The quality and features of this component has a major impact on the adoption of bio.tools, and by extension, the research software ecosystem. We will leverage the evolutions of the ecosystem to facilitate the edition of software metadata, by designing a new software metadata editor.

Activity 2: Defining the next generation of the bio.tools search UI and backend

The search function in the bio.tools registry is its most important functionality, as it usually represents the first access point to bio.tools, and even the ecosystem itself. As such, it is an essential part for the findability of the referenced software.

The current solution, although functional, has some limitations. For instance, it does not exploit the reasoning capabilities provided by EDAM (e.g. taking into account the class hierarchy of EDAM concepts in the search). Also, the additional metadata provided by other resources through the ecosystem are not incorporated in the current interface.

Our goal is to design the next version of the search interface of bio.tools. There are therefore two components to include, the backend capabilities of the search engine itself, and the user interface which has to provide an access to these capabilities.

Activity 3: Outreach

Our outreach efforts will be focused on building partnerships with organisations such as EOSC and RDA, as well as academic publishers and literature resources. We will develop, in collaboration with these organisations as well as ELIXIR communities, guidelines that promote good practices in publishing software metadata.

The provided guidelines and templates will help research software developers provide the required metadata to accompany published research, to ensure that all relevant information is captured and made available to the scientific community.

Activity 4: Expanding the Ecosystem

We recognise the importance of aligning our efforts with other initiatives in the scientific community, such as CodeMeta. To further support the advancement of science and facilitate collaboration, we are expanding our efforts beyond traditional scientific software to include machine learning models. Inclusion of reusable ML models and connecting to current efforts will improve the relevance, “market” value, and synergies of the Research Software Ecosystem.

Given the current momentum and growing interest in the field of machine learning, we will explore the options for enabling the Ecosystem users to find information about reusable ML models - as reusable tools - from ML model registries and platforms (e.g. Hugging Face or BioImage.io).

Lead partners: University of Bergen (Norway), BSC (Spain)

This Work Package will be responsible for guidelines, standards, and knowledge sharing dedicated to making research software FAIR.

It will promote principles of software management and best practices, deliver up-to-date standards for describing software and other research objects, and disseminate these in the form of software stewardship. The required standards include EDAM, data models and information standards for recording information ("metadata") about research software, and templates for creating software management plans (SMPs).

EDAM is widely used by most ELIXIR Platforms and Communities, as one of the common knowledge management standards. It serves findability, provenance, and interoperability. WP3 will contribute to keeping EDAM up to date, and coordinate EDAM extensions by ELIXIR Communities, Nodes, and affiliated projects, covering their needs.

SMPs play the same role as DMPs, but for software. The ELIXIR SMP is low-barrier, tailored for life scientists, and fully aligned with the FAIR Research Software principles.

A software management portal will be designed linking knowledge, training materials, and services through a single entry point. It will provide quality metrics including FAIRness assessment (connected to WP4 and the Training Platform) and the utility of machine-actionable SMPs will be investigated.

WP3 will also work together with other WPs, Platforms, and global initiatives, on delivering data models and information standards for describing software-based resources, and enhancing citations. This WP will engage with international organisations dealing with research software, such as EOSC, RDA, NIH, and Australian BioCommons. We will also work towards expanding the software best practices to further domains (such as AI/ML).

Activity 1: Best practices in research software

Building on the success of the Software Best Practices group, we will expand the effort to capture best practices across relevant efforts, such as Machine Learning (aligning also to the RDA FAIR4ML group) and web applications.

The first step will be to perform a community review of the specific domain, and will consequently reprise the approach taken for the software best practices, i.e. defining the minimal set/low barrier actions (such as the 4OSS), review the connection the FAIR principles (similar to FAIR4RS), ultimately leading to a structure format to capture relevant information (such as the SMP).

Activity 2: Machine actionability in software

Machine actionability makes interoperability easier for machines, while also enabling metadata-based links across tools within the Tools Ecosystem and relevant efforts in other domains.

In the case of Software Management Plans, it makes it easier for different stakeholders to review and assess the information captured through the SMPs. We will refine / tune the current metadata schemas that were produced from the joint RDA/EOSC-Future / ZBMed / ELIXIR project on machine-actionable SMP, while also establishing better connections to the Bioschemas community.

Based on this effort, we will integrate the metadata schema into the Software Management Wizard, ultimately transforming questionnaire-based information captured in the wizard into maSMPs (machine-actionable) SMPs.

Finally, we will review engagement strategies to connect and align to other relevant national (e.g. NFDI4DataScience, eScience Center Netherlands) and international (e.g. RDA, SWH) initiatives.

Activity 3: Quality indicators for software

Using the existing efforts around research software quality (such as EOSC, NFDI, etc), this task will aim to map and align the various indicators to the existing and/or planned services (e.g. OpenEBench), tools (e.g. Software Observatory, FAIR-Checker) and frameworks (e.g. SMPs, Bioschemas) that exist within the Tools Platform ecosystem.

Activity 4: Training and capacity building for software

We are continuing to expand the TeSS collection of courses and materials around Best Practices (currently including the 4OSS and the new SMP) with existing relevant resources from within and beyond ELIXIR (such as Carpentries, GOBLET, GTN, etc), and in a co-production model with the Peoples’ Tier CoS.

Additionally, new training material will be created to capture the research software quality aspects, connecting the respective Tools Platform and the Training Platform activities. Moreover, and in close collaboration with the Learning Paths Focus Group, we will co-develop and design dedicated learning pathways around research software.

Activity 5: Application of EDAM ontology to ELIXIR Communities and diverse ELIXIR services

Technical and community capacity building for the EDAM ontology is required for keeping EDAM up to date with the evolution and challenges of biosciences and global issues.

Our focus is on enabling ELIXIR Communities to expand the ontology, and ensure that it encapsulates their needs. This will enable scientists to more easily find and use existing resources.

We support the development of robust and user-friendly platforms that empower researchers leverage the wealth of resources and knowledge available in the scientific community, now also extending beyond life sciences.

This would also entail embedding EDAM in as many step-by-step guides and training workshops as possible, and making it interoperable with other main ontologies and linked open data.

We will aim at refining EDAM for improved performance of AI applications, such as the text mining to annotate and add to bio.tools, or to enrich scientific articles, assisted workflow generation and provenance.

Lead partners: University of Bologna (Italy), IRISA/GenOuest (France)

In the life sciences, research software is used in a distributed manner across computational infrastructures. Distributed use includes software used in a unique place, to tools being used across different installations, to federated learning. Despite the increase in complexity, these approaches share many underlying considerations. This WP will focus on three of them:

  1. Benchmarking efforts driven by research communities.
  2. The distribution and deployment of containerised software across heterogeneous systems.
  3. The reproducibility and replicability of research outcomes.

Activity 1: Defining standards for benchmarking datasets across emerging communities

While benchmarking is important to ensure that different computational approaches can be compared and evaluated, we have observed that only a few people are currently working on this. In particular, few people are defining datasets (synthetic or otherwise) that can be used in benchmarking.

To address this issue, we plan to leverage the ELIXIR communities to identify individuals who are passionate about benchmarking, and have them champion these efforts. By establishing a community of practice focused on benchmarking, we aim to support the development of standards and practices that can be adopted across the scientific community.

Activity 2: Facilitating benchmarking through containerisation

Easy and reproducible deployments are a necessity for modern data science. It gets even more important when distributed analysis are becoming more important and broadly used by researchers. Containers, and especially the Bioconda based BioContainers have proven to be a reliable source to tackle the deployment issues across clouds, HPCs and local deployments.

BioContainers are used by Nextflow, Snakemake and Galaxy, and are therefore part of state-of-the-art workflows. In this WP, we will continue our Container efforts, adapt to new architectures (ARM, RISC-5) and support the workflow community to create containers as they are needed. This includes a special set of containers for benchmarking use-cases selected by the involved communities (WP4.1).

Lead partners: University of Manchester (UK), VIB (Belgium)

To address the increasing complexity of life science analytics, ELIXIR communities need access to large scale compute capacities, often in an international context. They need to use sustainable compute resources, with the appropriate regulatory compliance, accounting and provenance.

While aspects of this have been addressed by the e-infrastructures in Europe, life scientists have additional demands for computing sensitive data. We will identify best practices in our communities, Nodes and projects in reporting resources (facilities, services) used to lay the groundwork for billing or other financial accounting of services.

The Platform will identify automated, lightweight solutions to track provenance. In turn this will improve the reproducibility of scientific analysis and the recognition of relevant service providers as well as researchers at the individual and organisational level.

Activity 1: Best practices, upscaling to build FAIR workflows using WorkflowHub in Communities

We will promote best practices for workflows, with a particular emphasis on WorkflowHub, leveraged by Communities. This will be facilitated via a white-paper to Communities, to kickstart their contributions to WorkflowHub and build exemplar datasets. Our efforts will follow best practices for workflows documentation (including EDAM ontology and crosslinks) and outreach.

An important aspect is the seamless invocation of the workflows between WorkflowHub and Galaxy, to upscale our users. This will build on initial work by the Workflows Community Initiative’s working group for FAIR Computational Workflows, which are gathering best practices across workflow systems, as well as EuroScienceGateway’s approach to workflow citation in scholarly communication. APIs, like the GA4GH TRS endpoint, will be used for more WorkflowHub integration across the Tools Ecosystem.

Activity 2: Environmental impact of Galaxy and bioinformatics

We recognise that our operations have an impact on the environment, and we are committed to reducing this impact using Galaxy, its associated ecosystem and its users, as a prime use case. We will also consider the relationships with RO-Crate and relevant schemas across ELIXIR to facilitate the work.

Our first step will be to conduct a comprehensive assessment of our operations and to identify areas where we can reduce our environmental footprint. We will then communicate our findings and progress to the wider scientific community, fostering a culture of transparency and accountability around environmental impact.

We will help raise awareness of the environmental impact of research, and we will develop and implement a range of mitigation strategies. Data collected by Galaxy and optimisations done will be integrated back into the Research Software Ecosystem, e.g. OpenEBench and coordinated with the Compute Platform (WP5). All our activities will be linked with the ELIXIR Focus Group for environmental impact.

Activity 3: Flexible upscaling of FAIR workflows

Upscaling of workflows is important for handling large volumes of scientific data (e.g. using GPUs and HPC), but can act as a restriction on FAIR principles. It can limit the number of computing infrastructures you can run them on and thus reduce their reusability. However these requirements are not currently well described when defining the workflow and sharing it in repositories like WorkflowHub.

This task will take advantage of ongoing work on optimising upscaling of workflows, in particular EuroScienceGateway and the Pulsar network for federated compute. We will reflect on changes needed for ELIXIR Cloud to fully support scalable workflows while retaining “just enough” FAIR aspects.

Implementations will enable federated upscaled computation of workflows, taking into consideration transport time (workflow sent to compute nearest data) and environmental impact (WP5.2).