The ELIXIR Cloud is an emerging ELIXIR Compute-based cloud infrastructure that delivers federated analytics use cases based on Global Alliance for Genomics and Health (GA4GH) API standards.
We have recently demonstrated how researchers can leverage ELIXIR Cloud services to deliver computational workloads across heterogeneous compute infrastructures (HPC, native cloud) at different ELIXIR Nodes via the GA4GH Task Execution Service (TES) API specification and its TESK (maintained by ELIXIR Compute) and Funnel implementations.
Experimental GA4GH TES backends are implemented in various workflow engines, including cwl-tes, Nextflow, Snakemake, Cromwell and Galaxy. To distribute individual workflow tasks across a network of TES instances in an effort to “bring compute to the data”, the ELIXIR Cloud includes the proTES gateway service, which distributes incoming TES tasks across a network of TES instances.
In this project, we will extend the capabilities of the ELIXIR Cloud such that ELIXIR researchers can run a wide range of workflows either entirely or partially on Microsoft Azure via its open-sourced native TES implementation (“hybrid cloud”).
Researchers at ELIXIR Nodes will thus be able to outscale workloads to a state-of-the-art public cloud environment, either to meet peak compute demands or to access hardware that is currently unavailable at a local or at an available ELIXIR Cloud node. Moreover, through the use of the TES, as well as the GA4GH Workflow Execution Service (WES) API, which abstracts across workflow engines such as the ones mentioned above, ELIXIR services can easily integrate with the ELIXIR Cloud to make use of the hybrid cloud capabilities.
The project will also strengthen the ongoing collaboration on the development of the GA4GH Cloud standards between ELIXIR Compute and Microsoft by increasing interoperability between our respective implementations. A fruitful collaboration on the use of the GA4GH TES API is likely to lead to a long-term collaboration on co-developing related solutions across various dimensions, including access control, smart distribution of workloads and integration of workflow federation via the GA4GH WES API.