Notes given in the application form
Eligibility criteria
- Must be an ELIXIR Service (i.e. part of an existing ELIXIR Node’s Service Delivery Plan, or is an ELIXIR Commissioned Service), or is in the official process/commitment of becoming one. (Required)
- Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
- Must indicate how it supports the FAIR Principles. (Required)
- Should fit into the EIP service framework in the ELIXIR 2019-2023 Scientific Programme for data interoperability or other activities relevant to the ELIXIR mission.
Additional notes
- Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
- Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
- Word limit guidance is noted for free text fields.
- Please include urls to external resources, where useful.
- Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).
1. Resource facilitation to scientific research
a. Interoperability Resource: Briefly describe the function of the Interoperability Resource
The FAIRtracks ecosystem involves gathering, curating, validating, integrating, and indexing the metadata of genomic track datasets, properly annotated as to be considered FAIR. Genomic track data encompass files with reference genome coordinates of observed or predicted features, e.g. epigenomic marks, variants, regulatory regions, transcripts, and chromatin structure. FAIRtracks metadata annotations follow minimal requirements structured around a set of core concepts: collection, study, experiment, sample, and track. Such minimal records link to existing resources by the use of identifiers.org CURIEs and ontology term IDs. This consistent metadata collection can be queried by downstream genomic clients through the TrackFind API.
b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources? Include the base URL and, if relevant, the introductory or “about” page URL.
Significant investments have gone into producing the large number of genomic tracks that reside within larger consortia or elsewhere. However, it is still difficult to find and access tracks properly annotated with interoperable metadata that match the level of detail and provenance required for reuse in a given research project. In brief: there is a need to better consolidate track metadata according to the FAIR principles, across species and domains. Existing metadata harmonization efforts have been more limited in scope to, e.g., human epigenomics (IHEC), blood samples (BLUEPRINT) or cancer samples (ICGC Data Portal). The FAIRtracks ecosystem draws on experience from the aforementioned projects, as well as from tools development (GSuite HyperBrowser and EPICO) in order to solve pressing practical issues hindering reuse of genomic tracks.
The target audiences of the FAIRtracks ecosystem (https://fairtracks.github.io/) are both providers and consumers of genomic tracks, including producers, curators, tool developers (from genome browsers and analysis tools to ad hoc scripts), as well as end-users. FAIRification of track metadata provides the minimal and crucial context needed for re-usability in new studies and meta-analyses. Thorough validation of FAIRtracks metadata ensures interoperability through the dependence on existing RIRs: Identifiers.org provides the means to resolve identifiers to reference genome assemblies (INSDC), experiment records and datasets (e.g., EGA, ENA), samples (BioSamples), and publications (DOI). The use of well-established ontologies through the Ontology Lookup Service, e.g. Cell Ontology, EFO, UBERON, and EDAM, empowers downstream applications by allowing more powerful and complex queries.
c. Resource url
d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)
The FAIRtracks ecosystem is a joint effort undertaken by members of three ELIXIR nodes: Norway, Spain, and EMBL-EBI. This effort brings together different resources provided by these groups in the context of major projects and services like TrackFind, Track Hub Registry, EPICO, and the GSuite HyperBrowser. The main purpose of the FAIRtracks ecosystem is to work towards the FAIRification of existing tracks and the definition of standards to be used by other members of the community when producing novel genomic tracks or curating historical track collections. This joint effort has been fostered by the ELIXIR Implementation Study “FAIRification of Genomic Tracks” initiated in 2018 and recently concluded. One of the outcomes of this implementation study has been the FAIRtracks draft standard (version 1.0) for genomic track metadata. As a first effort, we have annotated the BLUEPRINT track collection (~5,500 tracks) according to the FAIRtracks JSON schema. The stable release of the FAIRtracks draft standard, its initial utilization on the BLUEPRINT metadata, along with the tooling around it, will all contribute to consolidate the initial community and serve as a reference for other efforts in this specific domain.
2. Community
a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)
Following the “Four simple recommendations to encourage best practices in research software” by ELIXIR, every element of the FAIRtracks ecosystem is publicly available at GitHub (listed below). This will allow other interested parties to easily adopt and contribute to the development of the standards and related tools:
- The FAIRtracks standard for track metadata (JSON Schema) is maintained at https://github.com/fairtracks/fairtracks_standard/.
- The Ensembl Track Hub Registry is one of the largest genomic track metadata providers, and now it supports FAIRtracks submissions. Source code: https://github.com/Ensembl/trackhub-registry.
- The FAIRtracks validation service extends JSON Schema validation with extensions related to data integrity, and CURIEs and ontology terms validation. Source code: https://github.com/fairtracks/fairtracks_validator.
Utility services:
- The FAIRtracks metadata augmentation service, which augments the provided minimal FAIRtracks valiteded metadata with human readable content. Source code: https://github.com/fairtracks/fairtracks_augment
- The FAIRtracks format conversion service. Source code: https://github.com/fairtracks/fairtracks_json_to_gsuite.
- TrackFind integrates all the gathered, curated and FAIRtracks validated track metadata. It allows faceted queries on different metadata features, and is planned to be extended with support for FAIRificaiton of metadata by batch curation features, including guided ontology matching. Source code https://github.com/elixir-oslo/trackfind.
In order to document interest and community impact, we intend to gather traffic information for the main contact points of the FAIRtracks ecosystem, as listed in section 3b below.
b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.
Currently, we foresee the following use of this ecosystem:
- The main usage will be track metadata search and import functionality by track analysis tools and libraries, as well as manual web search and ad hoc scripting by researchers.
- Most of the public genome browsers could improve their track searching capabilities, once their codebase is augmented to deal with the enriched genomic track metadata produced following the FAIRtracks standard.
- Services focused on linkage and/or description of tissues, cellular types and cellular lines could also show the known genomic data tracks in which the given tissues, cellular types or cellular lines are involved.
- Services such as EGA could provide references to the genomic data tracks related to a given study, dataset or experiment.
Moreover, some of the services can be re-used in other contexts, e.g. FAIRtracks validation, which can easily be used to validate other JSON-based submissions.
c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).
Both the data model and all the developed tools around the genomic data tracks ecosystem include documentation in their source code repositories.
An ELIXIR Webinar describing the FAIRtracks ecosystem was lectured in November 2019. Other dissemination work included poster presentations at the ELIXIR All Hands 2018 and 2019 meetings, and the Galaxy Community Conference 2019. There is currently a manuscript in preparation, which describes the decisions and outcomes from the “FAIRification of Genomic Data Tracks” implementation study, showing the landscape of the whole ecosystem and the tools developed around the tracks metadata data model. We expect to submit it within 2020Q1 to the ELIXIR channel in F1000.
After the outcome of the implementation study have been published, we intend to engage the community more directly, including track producers, track collection maintainers, tool creators, and end-users. Possible outreach activities include presence at central conferences relevant to the different user communities as well as arranging relevant workshops. Possible workshops include "Bring your own metadata" (BYOM) workshops to help data producers and maintainers to FAIRify their track metadata, and "Bring your own tools" (BYOT) workshops to help tool developers connect to relevant FAIRtracks APIs. Once the available metadata reaches a point of maturity, we could arrange end-users "Bring your own project" (BYOP) workshops focused on making use of existing track metadata in the context of individual research projects.
d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.
Track metadata harmonization efforts has been mostly limited to certain consortia portals. In order to find and access track files from several track providers, downstream track analysis tools or libraries would have needed to implement direct support for each API and metadata format. We have experience of implementing such a solution in the GSuite HyperBrowser framework; the feature was very useful to researchers, but did not scale and was difficult to maintain. Historically, the availability of track files has been one of the main reasons behind the popularity of genome browsers. We thus anticipate that once the FAIRtracks metadata contents reach a certain maturity level, the FAIRtracks ecosystem will be similarly critical for downstream analysis tool frameworks, as well as for direct use by end-users.
We have implemented two clients downstream of the TrackFind API:
- The EPICO TrackFind backend allows the whole EPICO platform (http://fairtracks.bsc.es/epico) to query about the available analyses associated with organisms and cellular types, as well as filter by them. Even more, it helps locating similar data tracks relevant for the different representations.
- The TrackFind client tool in GSuite HyperBrowser (https://hyperbrowser.uio.no/trackfind) provides a helpful step-by-step search interface, where each selection limits the values that can be selected in the next step. The search results are returned in the tabular GSuite format, which can be used as input to the other data preparation and analysis tools.
Overview of dependencies in the FAIRtracks ecosystem: https://drive.google.com/open?id=1-pYAVM9MEr0mkIIApoAmQ4y2KDR1cFaD
3. Quality of resource
a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.
The FAIRtracks ecosystem is a network of independent resources (Track Hub Registry, TrackFind, EPICO, and GSuite Hyperbrowser) that work asynchronously and do not require each other to continue providing their core functionalities. TrackFind only needs to import metadata from the Track Hub Registry when it has been changed. The validator and utility services are used in different exchanges, including the metadata curation process which is currently very much a manual effort. Downtime of the TrackFind service will obviously break the functionality provided by the TrackFind clients. Still, the uptime and response time requirements are currently limited. The performance of the various parts of the ecosystem needs to be measured and matched to increase in traffic.
Updates to the FAIRtracks draft standard will be provided as required by community adoption. Each update to the FAIRtracks draft standard will trigger a series of updates to the other services. We will provide and document a regular software release cycle in accordance with community needs.
b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.
Web-based user interfaces:
- Track Hub Registry (track hub search and submission interface): https://trackhubregistry.org
- TrackFind (track metadata search interface, will be extended with curation mode): https://trackfind.elixir.no
- EPICO TrackFind client: http://fairtracks.bsc.es/epico
- GSuite HyperBrowser TrackFind client: https://hyperbrowser.uio.no/trackfind
REST APIs:
- Track Hub Registry APIs: https://trackhubregistry.org/docs/apis
- TrackFind API: https://trackfind.elixir.no/api
- FAIRtracks validation service: http://fairtracks.bsc.es/api
- FAIRtracks utility services: https://fairtracks.elixir.no/api
c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).
As the developers and initial contributors of the FAIRtracks ecosystem, we will rely on the guidelines put forward by the ELIXIR tools platform around the development of quality software. Moreover, we are registering each resource of this ecosystem in the relevant ELIXIR registry, e.g. bio.tools and FAIRsharing.org. OpenEBench will be used to monitor different aspects of the software quality of each resource registered in bio.tools. OpenEBench implements, refines and, for some aspects, extends the criteria established by the Software Sustainability Institute as long as the criteria can be measured automatically. Importantly, we can have those objective metrics along the whole life cycle of the elements of the FAIRtracks ecosystem including the time that the tooling is replaced and/or decommissioned.
d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.
The FAIRtracks community has several contact points, based on the nature of the question or issue. Although general questions about the ecosystem does not have yet a public forum, both the FAIRtracks standard (https://github.com/fairtracks/fairtracks_standard/issues) and all the different software components have available issue pages in their repositories, where any people with a valid account can provide questions and bug reports.
The TrackHub Registry provides a user feedback page at http://trackhubregistry.org/help
Documentation of the services are available from the GitHub repositories, while the REST APIs are documented at their core URLs. The main FAIRtracks landing page (http://fairtracks.github.io) lists a few tutorials and publications. We will gradually improve the training material and provide links on the ELIXIR TeSS portal where applicable.
4. Legal framework, funding, and governance
a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.
- FAIRtracks standard: CC-BY 4.0 (https://github.com/fairtracks/fairtracks_standard/blob/master/LICENSE.md)
- TrackHub registry: Apache 2.0 (https://github.com/Ensembl/trackhub-registry/blob/master/LICENSE)
- FAIRtracks validator: LGPL 2.1 (https://github.com/fairtracks/fairtracks_validator/blob/master/python/L…)
- FAIRtracks validator server: AGPL 3 (https://github.com/fairtracks/fairtracks_validator/blob/master/python_s…)
- FAIRtracks augment service: Apache 2.0 (https://github.com/fairtracks/fairtracks_augment/blob/master/LICENSE)
- FAIRtracks format conversion service: MIT (https://github.com/fairtracks/fairtracks_json_to_gsuite/blob/master/LIC…)
- TrackFind: MIT (https://github.com/elixir-no-nels/trackfind/blob/master/LICENSE)
b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.
The TrackHub Registry, which collects external submissions, collects user contact information for practical purposes, under EMBL’s Data Protection Policy (https://www.ebi.ac.uk/data-protection/privacy-notice/trackhub-registry-…).
TrackFind makes use of the ELIXIR AAI service for authentication, and collects basic user information under the privacy policy available at https://trackfind.elixir.no/pp. Information about user rights under GDPR is also provided: https://trackfind.elixir.no/GDPR.
c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.
The FAIRtracks draft standard, FAIRtracks utility services, and TrackFind have been accepted by ELIXIR Norway leadership to be hosted as part of the existing service "The BioXSD / GTrack ecosystem" delivered by ELIXIR Norway (as described in its Service Delivery Plan). ELIXIR Norway is funded by the Research Council of Norway and the partner institutions, currently by a four-year grant that expires in 2021. The partner institutions are committed to supporting the ELIXIR Norway project, and it is assumed that core funding will continue after 2021 through an extension, possibly as a more permanent funding basis.
As stated in the ELIXIR-ES Service Delivery Plan (SDP), resources produced as part of the Spanish groups participation in the ELIXIR activities, e.g. Grants, Commissioned Services, will become part of its portfolio. The first developments around the FAIRtracks validator service were done for OpenEBench, a platform developed in the context of the ELIXIR Excelerate project, and extended as part of the Implementation Study on “FAIRification of genomics tracks”. As such, this service will be maintained and extended by the BSC as part of the ELIXIR-ES SDP. This is conditioned to the renewal - a three-years funding cycle - of the core funding of the ELIXIR-ES.
The Track Hub Registry is part of the Ensembl project, which is supported by a dedicated grant scheme. Importantly, Ensembl has been recognized as one of the ELIXIR Core Data Resources, which implies that there is a sustainability plan to maintain its activities.
d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.
Governance is based on an active and open collaboration across the three groups involved in the FAIRtracks ecosystem. This is demonstrated by the joint administration of the source code repository supporting most of their activities. Each specific service will develop their own governance model or adopt the one from their parent project e.g. the Track Hub Registry as part of Ensembl is linked to the Ensembl Governance model. The FAIRtracks JSON schema aims at becoming a standard that will be governed in an appropriate collaboratory manner. This will allow the standard to evolve accordingly to the needs of the community while maintaining internal coherence.