Microbial Biotechnology services

Name Description ELIXIR Node

The microbial world provides an abundant source of biological catalysts for chemicals of medical and economic interest such as pharmaceuticals and biofuels. However, a sustainable resource for systems biology, that integrates experimental and predicted data on microbial metabolism, is still lacking. This Implementation Study defines the requirements for a European Bioinformatics resource for microbial metabolism as well as providing a practical demonstration of how existing funded resources from ELIXIR partners could be integrated to meet those needs.

This Study brings together complimentary data types:

  • microbial genomes and genes
  • protein function annotation
  • enzymatic reactions
  • metabolic pathways
  • metabolic models

These connections via RDF standards have many underpinning implications in medicine and economy such as pharmaceuticals and biofuels.

This study has now been completed, the end report is available here.

An article in the ELIXIR F1000R gateway will be made available shortly. 

Webinar summarising the outcomes

ELIXIR Switzerland, ELIXIR France, EMBL-EBI

Over the coming decade, Europe will face critical challenges in maintaining biodiversity, ensuring food security and combating pathogens. Our 2024–28 Programme will address these issues by mobilising and integrating molecular data, using successful coordination models from human genomics. Through strategic investments and collaboration in externally-funded projects, ELIXIR will enhance scientific services and support transnational research in these essential areas.

The following projects have been selected as part of the ELIXIR 2024–28 Programme’s Biodiversity, food security and pathogens Science Tier:

  • E-PAN: Enhancing pan-genome analysis in plants
  • FAIRyMAGs: Optimising Metagenomics Assembled Genomes building: workflow finalisation, training material development, real data evaluation and resource allocation tool creation
  • HARVEST: Handling and alignment of plant research FAIRification – value through the use of ELIXIR data Standards and Tools
  • Odyssey: Connecting molecular and geographical biodiversity data

With the declining cost of genome sequencing, the focus of plant researchers is shifting towards characterising the wide genomic diversity present within a species. Crop pan-genomes consist of the sequencing, comparison and integration of multiple different genomes from the same agriculturally important species such as wheat, rice and potatoes. Exploiting the information encoded within these pan-genomes can lead to the development of new cultivars more resilient to upcoming challenges like increased drought and heat stress. 

Multiple consortia are independently generating and integrating these pan-genomes, but there is currently little progress in streamlining and homogenising these efforts. While sequence quality is no longer a major issue, the completeness of both assembly and subsequent gene annotation are much harder to correctly quantify, while being the major drivers in explaining the adaptive differences between genotypes. Where there are efforts to visualise and browse pan-genomes, for example by using graph representations, the easy retrieval of gene Presence Absence Variation information or structural rearrangements is currently lacking, hampering knowledge learning. 

E-PAN aims to streamline the efforts of different research groups within the ELIXIR Plant Science Community. This encompasses the development of effective standards, computational pipelines and tutorials to assess the quality of pan-genomes and provide solutions to identified problems. We will also evaluate and integrate different approaches for data visualisation and browsing, which will be used by different partners sharing pan-genomics results. A one-day meeting and an online workshop will be organised to disseminate results and initiate new collaborative projects. These concerted efforts will lead to a standardised approach to be used in future pan-genome projects, a reduction in duplication efforts across consortia, and a set of tools to visualise and mine pan-genomics results.

Nodes involved: ELIXIR Belgium, ELIXIR Germany, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
Communities: Plant sciences

Metagenomics Assembled Genomes (MAGs) are crucial for understanding biodiversity, enhancing food security and combating pathogens by providing insight on uncultured and unexplored genomes. This proposal outlines a comprehensive project aimed at advancing metagenomics research through the advancement, optimisation, evaluation and dissemination of robust FAIR workflows for building MAGs. 

Leveraging the Galaxy platform, our primary objectives include finalising a user-friendly state-of-the-art Galaxy workflow tailored for MAG construction, and ensuring its accessibility and reusability through integration with WorkflowHub. To support user adoption and proficiency, we will create FAIR educational materials hosted on the Galaxy Training Network (GTN), empowering researchers with the skills necessary to use the workflow effectively. 

The efficacy of the developed workflow will be rigorously evaluated by analysing MAGs generated from simulated and real-world data-spanning diverse environments: atmosphere, marine and cow gut microbiomes. This evaluation will provide valuable insights into the workflow's performance and its applicability across different sample types, complexities and ecosystems.

We will also investigate the computational resources required for executing the assembly step of the workflow using data provided by several Galaxy servers and the MGnify team on various input datasets. The aim would be to optimise resource allocation to ensure efficient and cost-effective MAGs construction. A novel tool will be developed to facilitate this process, allowing researchers to accurately estimate and allocate resources for each step of the assembly pipeline. 

By addressing these objectives, our project aims to accelerate metagenomics research by providing researchers with a comprehensive and accessible framework for MAGs construction. This framework will not only streamline the workflow for building MAGs but also facilitate reproducibility, collaboration and innovation within the ELIXIR Microbiome Community.

Nodes involved: ELIXIR France, ELIXIR Germany, ELIXIR Italy, EMBL-EBI
Communities: Galaxy, Microbiome

The standardisation and accessibility of plant data is a major challenge for agricultural research. MIAPPE, which was developed as part of the transPLANT and ELIXIR-EXCELERATE projects, has made a decisive contribution to unifying data capturing. Also, the FONDUE Implementation Study facilitated the integration of phenotypic and genotypic data. 

Nevertheless, challenges persist in achieving full FAIRness of plant data. The development of guidelines and best practice documents within the Commissioned Service INCREASING has improved this. However, further enhancements are required, such as providing additional documentation and reference datasets. 

To address these needs, it is important to assess the practical effort required to FAIRify datasets using MIAPPE, ISA, ARC and RO-Crate standards. The idea is to provide biologist-friendly data documentation and at the same time  introduce machine-actionable formats for bioinformaticians to use. A further challenge arises from the scattered nature of the information, as there is no single resource on which all the information is collated. 

In HARVEST, we aim to address these challenges by FAIRifying datasets (DROPS, AGENT) using the latest version of MIAPPE as a basis, which now covers more diverse and complex use cases. This process will include enriching the MIAPPE documentation in particular with example datasets, updating training material and refining mappings to other interoperable formats such as BrAPI, Bioschemas and ISA-Tab/JSON. We will also establish links using FAIDARE to repositories such as EMBL-EBI EVA, e!DAL-PGP, recherche.data.gouv and Zenodo, to enhance data sharing and reuse opportunities. An extension of the RDMkit Plant Sciences pages will be implemented to serve as a primary hub for information on FAIRification of plant data. Furthermore, we will be consolidating resources and improving accessibility through direct linking to the original web resources and recipes, also adding Jupyter notebooks to the FAIR Cookbook where possible.

Nodes involved: ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK, EMBL-EBI
Communities: Plant Sciences

Understanding molecular biodiversity is essential for ecological conservation and sustainable development. While a vast array of molecular data awaits exploration, its lack of connectivity with other sources of data and metadata such as geographical reference, habitat, population size and phenotypic data often pose significant barriers to biodiversity research.

This project proposal is about developing Odyssey, a web portal in the form of a user-friendly interface that will allow researchers, educators and citizens to navigate the world of molecular biodiversity using Greece and Norway as case studies – two countries with a characteristic and unique wealth of biodiversity, representative for Mediterranean and Nordic types of ecosystems respectively. 

Based on existing sources of information and prototype applications available for specific regions and taxa, this project aims to link actual efforts and develop a new interface to offer diverse functionalities for data exploration and analysis, such as descriptive statistics, graphs, maps, customisable data filters and dynamic visualisations. Through modular design, the application will ensure flexibility and scalability, enabling easy integration of new data sets and analytical tools in the future. This approach will be used for training and communication, inviting traditional biodiversity research groups to utilise new information concerning the spatial patterns of biodiversity and their connection with features that are important for designing conservation measures, such as habitat connectivity, representativity, population demographics, dynamics of adaptation and migration.

Odyssey’s outcome will be a valuable tool for studying and, ultimately, offering a basis for managing and conserving the rich molecular biodiversity of Greece and Norway, as well as supporting the activities of the ELIXIR Biodiversity Community in the two Nodes and in Europe. This will promote collaboration, innovation and knowledge exchange in biodiversity research and beyond. 

This new tool will be developed and offered under an open-source licence, encouraging community participation and contribution to further enhance its capabilities and broaden its applications, fostering a robust network for biodiversity research in Greece and Norway.

Nodes involved: ELIXIR Greece, ELIXIR Norway
Communities: Biodiversity

ELIXIR Belgium, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK, EMBL-EBI, ELIXIR Italy

Cellular and molecular biology are fundamental to ELIXIR's mission. As part of our 2024–28 Programme, we are committed to advancing data services and software for research on nucleic acids, proteins and other biomolecules. This initiative will address new demands for multi-omics and multi-modal analyses, including imaging, by developing methods and partnerships. We will also expand expertise in reusable data and software to incorporate FAIR models, ensuring robust solutions for modelling at all scales. 

The following projects are key to connecting the latest developments with established data resources, unlocking the potential of cellular and molecular biology:

  • Advancing structural and functional ontologies of disordered proteins 
  • DBTLHub: Towards a one-stop shop for connecting databases, datasets and tools for the Design-Build-Test-Learn cycle in biotechnology 
  • Spatial2Galaxy: There is no Galaxy without Space 
  • Next level of reproducible, comparable and integrable Metabolomics

This project addresses the limitations of current ontologies in capturing the dynamic nature of disordered protein regions by pursuing several primary objectives. Firstly, novel structural and functional ontologies will be developed to accurately represent the structural heterogeneity and dynamic functional annotations of proteins. These ontologies will incorporate timescales, annotating the kinetics of structural transformations to elucidate molecular mechanisms and regulatory pathways governing protein dynamics. 

Collaborating with existing databases and consortia will ensure seamless integration of ontological resources and experimental data, fostering interoperability and accelerating discoveries. A standardised file format specification will also be developed in collaboration with the Human Proteome Organisation Proteomics Standards Initiative, facilitating the encoding of structural state transitions within disordered protein regions. This specification will enhance data interoperability and exchange among research groups and databases, providing a common language for describing structural transitions and advancing our understanding of the functional implications of protein dynamics in biological systems.

Nodes involved: ELIXIR Belgium, ELIXIR Hungary, ELIXIR Italy, EMBL-EBI
Communities: 3D BioInfo, Intrinsically Disordered Proteins

This project aims to strengthen the basis for a one-stop shop connecting databases, datasets and tools for the deployment of the engineering Design-Build-Test-Learn (DBTL) framework in biotechnology. It will do so by surveying the tools and data landscape, pinpointing gaps and opportunities, and establishing design patterns for task-specific workflows for analysis, integration and sharing of multimodal data. 

It will provide a resource that will allow users to navigate the complex landscape of biotechnology tooling and data, as well as to establish solutions that fit their specific DBTL requirements. Use cases from ongoing programmes in various communities will be used to ascertain and establish the pragmatic value of the solutions. 

The work will be carried out through hands-on activities, dedicated workshops and hackathons, providing training and resources, as well as fostering industrial engagement. The experience of the communities and platforms involved in systems biology, industrial biotechnology, metabolic modelling, metabolomics, enzymes, bioprospecting and data management will be particularly valuable in this respect, as well as their respective industrial relations. Accordingly, the project engages participants from seven ELIXIR nodes and connects researchers and their activities from six communities. 

The project outcomes will contribute to advancing the ambition of connecting the latest developments and established data resources across ELIXIR to realise the potential of cellular and molecular biology, particularly in the fields of industrial biotechnology and biomanufacturing.

Nodes involved: ELIXIR Spain, ELIXIR Greece, ELIXIR France, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
Communities: Biodiversity, Microbiome, Metabolomics, Microbial Biotechnology, Research Data Management, Systems Biology

Spatial transcriptomics (ST) was named ‘Method of the Year 2020’ by Nature Methods and was more recently featured in Nature’s Seven technologies to watch in 2024. ST is now a prerequisite for researching transcriptional pathology at the cellular and molecular levels. Current use of ST is ubiquitously applied to multiple pathologies, including neurodegenerative disease, cancer, cardiomyopathy and nephrology. There is also an emerging application of ST in plant and microbiome research. While there are a plethora of spatial analysis applications, these are not unified or easily manageable by research scientists and they lack any hope of delivering FAIR and reproducible results.

To address this challenge, we will implement Spatial2Galaxy (S2G) – a self-contained, reproducible, scalable FAIR spatial transcription analysis platform for researchers and bioinformaticians alike. We will develop S2G based on our success with developing Galaxy workflows, training materials and ST and single-cell analysis pipelines. 

S2G will provide state-of-the-art ST tools and workflows with proven high performance in benchmarking studies, ensuring the uptake of best practices. These tools will be demonstrated on datasets that connect various ST databases. This will consolidate community guidelines for integrative multi-modal single-cell omics and imaging analysis. Compared to non-spatial single-cell sequencing, presented as the Nature ‘Method of the Year 2013', it took six years until practical training and workflows for its analysis were FAIRified and available in Galaxy by 2019. In contrast, S2G aims to reduce this gap between technologies becoming relevant and provision of FAIR resources to the life science community for ST. 

Nodes involved: ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK
Communities: Cancer Data, Galaxy, Human Copy Number Variation, Single-Cell Omics

The ELIXIR metabolomics community relies on standards, formats and data treatment solutions development and adoption, but it remains challenging to ensure high-quality reported metadata, sufficiently contextualised results, interoperable and reusable datasets and to integrate these metabolomics data with other omics or studies. 

This project is designed to address these issues and aims to connect key international standards with ELIXIR resources, as well as creating associated community guidelines and training materials. 

Based on the FAIRification framework, activities in the project will: i) increase interoperability and reuse of public metabolomics datasets and workflows through enhanced and extended open data standards, resources and new semantic annotations, ii) define, ensure and establish quality control for study baselines in Metabolomics and Exposomics, and iii) facilitate metabolomic data interpretation and meta-analysis integration with multi-omics and systems biology studies. 

As a first necessary step, the project will create a Semantic Metabolomics Data Model to standardise metadata, ensuring unambiguous reuse of metabolomics projects. This model will focus on integrating key ontologies, providing open training initiative and enhancing the interoperability of metabolomics data through the production of open guidelines for annotation steps. By linking with ELIXIR’s Deposition databases, ISA Framework and other services, the project seeks to boost interconnection with ELIXIR platforms, other ELIXIR communities (Systems Biology, Food and Nutrition, Galaxy, Proteomics, Toxicology, Research Data Alliance Focus Group ...), the FAIR Cookbook and BioSchemas.org communities. Project outcomes are expected to promote  the emergence of ambitious and innovative semantic-based solutions for inter-comparison of studies in healthcare, clinical and plant domains.

Nodes involved: ELIXIR Czech Republic, ELIXIR Germany, ELIXIR Italy, ELIXIR Spain, ELIXIR France, ELIXIR Netherlands, ELIXIR Sweden, ELIXIR UK, EMBL-EBI
Communities: Food and Nutrition, Galaxy, Metabolomics, Proteomics, Research Data Management, Single-Cell Omics, Systems Biology, Toxicology

ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR France, ELIXIR Greece, ELIXIR Hungary, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR UK, EMBL-EBI

Advancements in biological system engineering have introduced new data challenges in biotechnology. These challenges are even more pronounced when high-throughput robotic systems are used in screening and manufacturing. While academic projects and use cases have enabled progress in data management, industrial systems face distinct challenges related to data sharing, security, and process verification. This project aims to investigate industrial data sharing challenges.

 As part of the Microbial Biotechnology Community, a recent joint implementation study between Newcastle and Manchester ELIXIR Nodes explored the data management requirements of biotechnology. This effort resulted in published guidelines in collaboration with the RDMkit team (https://rdmkit.elixir-europe.org/). The study was extended to consider an automated hybrid in-vitro/in-silico workflow designed to characterise recombinant enzymes and their protein families, with a particular focus on understanding their data characteristics in an academic setting (see figure below).

Workflow to characterise recombinant enzymes and their protein families

Prozomix Ltd., an SME specializing in enzyme-based biocatalysts, is using high-throughput systems to identify novel members of enzyme protein families, within the company and in collaboration with academic partners. This project proposes a technology transfer process from academia to industry, involving the adaptation and deployment of the automated pipeline at Prozomix.

A data requirements analysis will address the specific data challenges faced by the industry, identifying the points at which data could be shared without undermining the company’s intellectual property. The ultimate goal is to develop generic guidelines for industrial partners to make their data FAIR (Findable, Accessible, Interoperable, and Reusable) without compromising intellectual property and data security.

The project's anticipated outcomes include fostering collaboration between the ELIXIR communities and industry, encouraging academic members to explore more commercially focused use cases, and industrial members to think about FAIR data. Additionally, the project aims to highlight the differences in FAIR data needs between industry and academia and create a generic framework to facilitate data sharing in industrial biotechnology.

ELIXIR UK

Advancements in biological system engineering have introduced new data challenges in biotechnology. These challenges are even more pronounced when high-throughput robotic systems are used in screening and manufacturing. While academic projects and use cases have enabled progress in data management, industrial systems face distinct challenges related to data sharing, security, and process verification. This project aims to investigate industrial data sharing challenges.

 As part of the Microbial Biotechnology Community, a recent joint implementation study between Newcastle and Manchester ELIXIR Nodes explored the data management requirements of biotechnology. This effort resulted in published guidelines in collaboration with the RDMkit team (https://rdmkit.elixir-europe.org/). The study was extended to consider an automated hybrid in-vitro/in-silico workflow designed to characterise recombinant enzymes and their protein families, with a particular focus on understanding their data characteristics in an academic setting (see figure below).

Workflow to characterise recombinant enzymes and their protein families

Prozomix Ltd., an SME specializing in enzyme-based biocatalysts, is using high-throughput systems to identify novel members of enzyme protein families, within the company and in collaboration with academic partners. This project proposes a technology transfer process from academia to industry, involving the adaptation and deployment of the automated pipeline at Prozomix.

A data requirements analysis will address the specific data challenges faced by the industry, identifying the points at which data could be shared without undermining the company’s intellectual property. The ultimate goal is to develop generic guidelines for industrial partners to make their data FAIR (Findable, Accessible, Interoperable, and Reusable) without compromising intellectual property and data security.

The project's anticipated outcomes include fostering collaboration between the ELIXIR communities and industry, encouraging academic members to explore more commercially focused use cases, and industrial members to think about FAIR data. Additionally, the project aims to highlight the differences in FAIR data needs between industry and academia and create a generic framework to facilitate data sharing in industrial biotechnology.

ELIXIR UK
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Greece, ELIXIR Netherlands, ELIXIR Spain

A formal framework for microbial biotechnology to manage and manipulate strains, samples, knowledge, data and metadata is still lacking. The Design - Build - Test - Learn (DBTL) cycle provides a conceptual framework for the development of tailor-made microbes and biological systems. The ELIXIR Microbial Community takes the DBTL cycle as a starting point for defining its four key objectives.

This implementation study builds upon the F1000 position paper for the Microbial Community and will revolve around three core, closely intertwined activities: Enzymes, Models and Ontologies & Workflows. All are embedded in a Training framework and relate directly to the Platforms Data, Tools, Interoperability and Training.

ELIXIR Netherlands, ELIXIR France, ELIXIR UK, ELIXIR Greece, ELIXIR Finland, ELIXIR Germany, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Switzerland, EMBL-EBI