Federated Human Data services
Name | Description | ELIXIR Node |
---|---|---|
ELIXIR Luxembourg | ||
This study follows on from a number of earlier activities that has established the ELIXIR Beacon Project. During 2019-21 the main aims are to:
To achieve these goals it is imperative to continue a strategic partnership with the GA4GH Work Streams as a Driver Project and to increase coordination with the ELIXIR platforms. The focus is to prioritise and deliver future European requirements on Beacon and Beacon Network API development, continue to develop the overall security framework for this service type, and contribute to the GA4GH Discovery Work Stream goals towards a global genomic query language. Timeline:
|
ELIXIR Switzerland, EMBL-EBI, ELIXIR Spain, ELIXIR Finland, ELIXIR Luxembourg, ELIXIR Sweden | |
EMBL-EBI | ||
ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Netherlands, ELIXIR Spain, ELIXIR Switzerland, EMBL-EBI | ||
ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Netherlands, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, EMBL-EBI | ||
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI | ||
ELIXIR UK | ||
ELIXIR Sweden | ||
ELIXIR Spain | ||
ELIXIR France | ||
ELIXIR France | ||
ELIXIR Denmark, ELIXIR Spain, EMBL-EBI | ||
ELIXIR Spain | ||
ELIXIR Slovenia | ||
The European Genome-phenome Archive (EGA) is designed to be a repository for all types of sequence and genotype experiments, including case-control, population, and family studies. The EGA will serve as a permanent archive that will archive several levels of data including the raw data (which could, for example, be re-analysed in the future by other algorithms) as well as the genotype calls provided by the submitters. In spite of EGA accepting data from all Europe, due to regulations over data and other constraints, it is desirable that ELIXIR Nodes deploy and operate Local EGA instances. The aim of this project is then to empower ELIXIR PT Node, and not only, with required expertise to deploy and operate Local EGA instances and related data processing procedures and pipelines. We will leverage ELIXIR PT efforts on designing and deploying pipelines for genomic data processing and ELIXIR ES expertise on Local EGA deployment and on the operation and use of HPC infrastructures. |
ELIXIR Portugal, ELIXIR Spain | |
ELIXIR Belgium, ELIXIR Denmark, ELIXIR Finland, ELIXIR France, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Spain, ELIXIR Sweden, EMBL-EBI | ||
ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Netherlands, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, EMBL-EBI | ||
ELIXIR France | ||
Human data and translational research is a high priority for ELIXIR and builds on the progress made in the previous programmes by the Human Data Communities. Within the Science Tier of the ELIXIR 2024–2028 Programme, advances will be focussed on enabling researchers (including research clinicians) to use ELIXIR’s infrastructure, for human genomic, phenotypic, imaging and demographic data to support discovery, analysis, innovation and integration of research findings into the clinic and healthcare. More specifically, through these projects we will ensure that millions of human genomes are discoverable and exploited in a biomedical setting through ELIXIR-supported infrastructure and community-endorsed standards, software, workflows and analysis environments across ELIXIR Nodes. On Data Deposition:
On Federated Data Analysis:
On Linking Data:
Theme: Data DepositionThe Federated European Genome-Phenome Archive (FEGA) network is an ELIXIR-supported infrastructure for making human genomic data discoverable and accessible across ELIXIR Nodes. This project seeks to accelerate data depositions into FEGA, which will significantly increase the data flow in and from FEGA nodes. In alignment with the goals of the Human data and translational research Tier of the ELIXIR 2024–2028 programme, this project will promote seamless data integration and increase global researchers’ confidence in the data stored within FEGA, thus strengthening the network's position as a trusted resource for genomic data. It will build capacity within the FEGA Nodes and increase awareness among a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR Nodes and two ELIXIR Communities. Partners represent four FEGA nodes at different levels of maturity, a member of the Cancer Data Community and both institutions managing Central EGA. The proposal is formulated around five timely coordinated tasks where all partners contribute their expertise to the final outcomes, converging in the deposition of several datasets to different nodes, testing the new tools and metadata model and blueprinting deposition of high-quality FAIR data in the future. Nodes involved: ELIXIR Switzerland, ELIXIR Spain, ELIXIR France, ELIXIR Norway, ELIXIR Portugal, EMBL-EBI Theme: Data DepositionHuman data, especially genomic data, is increasingly being federated across borders and institutions, with many stakeholders participating in multinational and global biomedical and health data networks, fostering collaborations and partnerships. While such international efforts are essential for the compilation and reuse of data, regulatory constraints often hinder the movement of certain data beyond organisational or national boundaries. Centralised approaches such as the Central European Genome-Phenome Archive (CEGA) are valuable, but not all data can be centralised. The Federated European Genome-phenome Archive network (FEGA) addresses this, with early work concentrated on local collection of data with central archiving of metadata. FHDportal aims to support both federated and central submission of metadata. It will do this by providing a reusable portal for gathering and storing metadata at a national level, and submitting required metadata centrally to enable discovery of datasets via the CEGA. FHDportal complements the existing system by providing a way to explore richer metadata (for example, including detailed information on specific datasets or local funding information), while enabling a core set of metadata to be queried centrally. FHDportal will be deployed and tested on FEGA nodes, and should be of interest to the many other countries seeking to join FEGA. The need for FHDportal is based on experience during onboarding and in moving to production nodes. It will offer a common solution for local mobilisation of data and metadata, which can be adapted to local situations. During development, it will be tested on both new and well-established nodes using different technical platforms and infrastructures. The resulting software will be provided to the whole community, and will hopefully become part of the emerging toolkit for new FEGA nodes wishing to establish themselves, and to ensure their nodes meet local needs while bringing European scale benefits. Nodes involved: ELIXIR Switzerland, ELIXIR Finland, ELIXIR Luxembourg, ELIXIR UK Theme: Federated Data AnalysisFederated analysis (FA) revolutionises genomics research by enabling collaborative analysis across distributed datasets, while safeguarding data privacy and facilitating comprehensive insights into genetic diseases. Federated access and analysis of human datasets is part of the ELIXIR scientific program. ELIXIR is also involved with the EUCAIM (European Cancer Imaging Initiative project, and coordinates the European Genomic Data Infrastructure (GDI) project, which aims to provide federated access to 1+M whole genome sequences (WGS). While the GDI project explores federated solutions to analyse its data, it does not foresee deploying FA solutions for evaluation. This project seeks to implement FA across four ELIXIR Nodes, using synthetic and real, publicly accessible Genome-Wide Association Studies (GWAS) data. To maximise the impact of this proposal, we plan to leverage the developments already made in the context of the EUCAIM project, specifically the orchestration solution around the Flower Framework and the ongoing developments in the FA, in the context of the Staff Exchange BRIDGE between ELIXIR and DCEG/NIH, where Yjs framework is the chosen solution. We also aim to represent the analysis using RO-Crates to track the provenance of the analysis, following the Five Safes Framework. The proposal is built around ongoing collaborations on deploying and testing FA solutions for analysing sensitive data across different projects like GDI, EUCAIM, BY-COVID and TRE-FX. This project aims not only to boost this interaction using the Flower Framework for FA, but also to strengthen the connection to NIH/DCEG through dataset sharing and comparing different FA frameworks. All Nodes involved in this project are active members of the ELIXIR Human Data Communities, especially the Federated Human Data and Cancer Data ones. The outcomes derived from this project will be disseminated not only to these Communities but also to all ELIXIR projects where this topic is relevant. Nodes involved: ELIXIR Belgium, ELIXIR Spain, ELIXIR France, ELIXIR Portugal, ELIXIR UK Theme: Federated Data AnalysisThrough the 1+Million Genomes (1+MG) initiative, Europe is scaling up efforts to build a shared framework and infrastructure to safely access and integrate clinical human data across borders, following regulatory efforts like the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS). These are pivotal in safeguarding sensitive information, while enabling authorised access for researchers, healthcare professionals and other actors. Integral to biomedical data security considerations are the European Genome-Phenome Archive (EGA), in both Central and Federated forms, recognised as the predominant European repository for the secure storage of pheno-clinical and genomics data. Mobilising data for secure analysis in Virtual Research Environments (VREs) remains challenging. Indeed, it is an active focus in ongoing projects like the European Genomic Data Infrastructure (GDI), EOSC-ENTRUST and EOSC4Cancer. Galaxy is a popular open-source, community-driven VRE for bioinformatics analysis that represents a unique platform for developing and testing novel strategies for data analysis. A prototyping strategy for the access and processing of sensitive data was demonstrated in a previous ELIXIR implementation study (2021–2023). By adopting GA4GH Crypt4GH encryption standard features, we enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys. We propose expanding this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs, all interactively orchestrated by the users on a public Galaxy server. The proposed solution offers flexibility with different levels of enforced restrictions ranging from scenarios with no limitations on encrypted data transfer and storage, to fully federated analysis scenarios, where analysis occurs near the data. Most of the required infrastructure can also be deployed independent of Galaxy, simplifying the potential implementation of these concepts in other VREs. Nodes involved: ELIXIR Belgium, ELIXIR Germany, ELIXIR Spain, ELIXIR Norway Theme: Linking DataToday, research generates more data than ever, and a multitude of experimental data types. Such data types are often connected at source: perhaps generated from the same samples or as part of the same study. It is important that different data types are made available for re-use in a linked and coordinated manner, enabling full reuse of all the data in integrated analysis. Experimental data types are often siloed in varied specialised repositories, using different metadata models, so linking them is not straightforward. Also, data obtained from living humans is sensitive and shared under a controlled access model, adding an extra layer of complexity. In this project, partners will establish a strong foundation for developing solutions to integrate multi-omic sensitive data effectively among FEGA nodes, biobanks and ELIXIR Core Data Resources such as PRIDE and GWAS Catalog. Five ELIXIR Nodes will be involved, as well as the Polish FEGA node (in-kind contribution) from two ELIXIR Communities (Federated Human Data and Proteomics), spanning three diverse data use cases to address the challenges of this open call. The project will start by developing a comprehensive landscape analysis of current human data linkage challenges and solutions (Task 1). Based on this, concrete models and prototypes will be proposed to link sensitive proteomics data (Task 2), cohorts and biobank data (Task 3), and population cohort-derived data (Task 4) to genomics data. Results from tasks 2 to 4 will be used to improve the FEGA metadata model. The project will result in more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data, and accelerating research. To facilitate broad engagement, the project will engage the ELIXIR Communities through dedicated online and in-person events, where both interim and final results of the project will be disseminated. Nodes involved: ELIXIR Finland, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, EMBL-EBI |
ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Norway, ELIXIR Portugal, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI | |
This project will increase interoperability between four ELIXIR resources (CATH, SWISS-MODEL, InterPro and PDBe), three of which are Core Resources, by building APIs that facilitate the import and export of data between them. The ultimate goal is to improve provision of 3D-Models for protein domain sequences via CATH, SWISS-MODEL and InterPro. Less than 10% of known sequences have experimentally characterised 3D structural information and yet this data is often essential for understanding the protein’s molecular function and biological role and for determining whether residue mutations could damage the protein and lead to disease. So this integration is very timely as it will enhance links between sequence and structure data. APIs will be built using well-established protocols and as well as promoting interoperability, and therefore sustainability, we will expand the data in each resource to ensure they serve a wider community of biologists. |
ELIXIR UK, ELIXIR Switzerland, EMBL-EBI | |
The implementation study project plan of ELIXIR Italy consists of six activities that aim to boost the cooperation with existing ELIXIR activities and are expected to deepen the interaction between ELIXIR-IIB, the Joint Research Unit embodying the Italian Node, and ELIXIR. The partners involved have already established contacts with other ELIXIR Nodes and the relevant ELIXIR Platforms and Services in order to ensure an advantageous outcome for all the involved parties. The goal of the proposed activities is to create and/or reinforce collaborations based on concrete measures. With this implementation study the Italian ELIXIR Node will achieve greater integration within ELIXIR service infrastructures and data interoperability policies. The topics of the selected activities and an additional coordination task are summarized below:
|
ELIXIR Italy | |
ELIXIR Finland, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, EMBL-EBI | ||
ELIXIR Finland, ELIXIR Germany, ELIXIR Norway, ELIXIR Spain, ELIXIR Sweden | ||
ELIXIR Spain | ||
This project is to improve the coordination between the Beyond 1 Million Genomes project and the Federated Human Data / ELIXIR CONVERGE projects. All three projects are based around Federated EGA, or its technology, to support cross border access to human controlled access genetic and phenotypic data. Objective 1: Strengthen the collaboration and coordination between the Hub and ELIXIR-FI, especially the Beyond 1 Million Genomes (B1MG) project and prospective Digital Europe Genomic Data Call, and the Federated Human Data Communities, ELIXIR Converge, and CINECA. Additionally leverage ELIXIR-Hub contacts with other nodes to explore new collaboration opportunities. The project lead is a Senior Coordinator at ELIXIR-FI on the B1MG project and co-lead of the Federated Human Data Community. Objective 2: Maximise adoption of GA4GH standards, and existing or developing services, such as Federated EGA and Beacon, throughout Europe by identifying and exploiting parallels between the deployment of Federated EGA nodes and B1MG data hubs. As Senior Coordinator for the B1MG project the project lead is tasked with driving adoption of the proposed infrastructure and subsequent deployment of data hubs by the 1+MG signatory member states. Objective 3: Identify and utilise additional collaborations within other European Projects, such as European Health Data Space 2 (EHDS2), to ensure both the genomic and phenotypic or clinical data is as interoperable as possible. The project lead was on the GA4GH product review committee for the version 2 of the Phenopacket standard which enables linking of genetic and phenotypic data. Objective 4: Align the deployment, ELSI, and governance procedures of B1MG data hubs with the B1MG Maturity Models. Ensure that these procedures are compatible with Federated EGA deployment ELSI procedures. The project lead has lead the Security work package for the Beacon project since 2017. |
ELIXIR Finland, EMBL-EBI |