BridgeDb | ELIXIR

Notes given in the application form

Eligibility criteria

Must be an ELIXIR Service (i.e. be part of an existing ELIXIR Node’s Service Delivery Plan, or is ELIXIR commissioned work), or is in the official process/commitment of becoming one. (Required)
Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
Must support or be forecast to support FAIR Principles. (Required)
Should fit into, or be forecast to fit into, the EIP roadmap for data interoperability or other activities relevant to ELIXIR mission.

Additional notes

Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
Word limit guidance is noted for free text fields.
Please include urls to external resources, where useful.
Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).

1. Resource facilitation to scientific research

a. Interoperability Resource: Briefly describe the function of the Interoperability Resource

BridgeDb is a platform for database identifier mapping, both simple identifiers (e.g. CHEBI:1234) and universal resource identifiers (URIs, e.g. http://identifiers.org/chebi/CHEBI:1234). It is the workhorse of data integration and supports the essential FAIR aspect of interoperability, with recent efforts adding detailed provenance and meaning to mappings (“scientific lenses”). BridgeDb provides the glue between bioinformatics processing pipeline blocks, and has an OpenAPI-based interface. It has been integrated in data integration tools like Cytoscape and PathVisio and as a package in R. BridgeDb can be installed locally as a webservice with a Docker container.

b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources ? Include the base URL and, if relevant, the introductory or “about” page URL.

The BridgeDb software library was introduced in 2010 [https://doi.org/10.1186/1471-2105-11-5]. The challenges at that time were developing software that was not tied to a single tool in need of mappings, but to be able to combine and stack mappings, and to easily switch to other, external, services . The key to solve these challenges was to create a standard interface between tools and the mapping services and to allow integration as a webservice. BridgeDb includes a software library intended to be used by bioinformatics tool developers.

BridgeDb currently provides mappings to genes and proteins (based on Ensembl & EnsemblGenomes), metabolites (based on HMDB, ChEBI, and Wikidata), interactions (based on Rhea) and gene-variant to genes (based on Ensembl and dbSNP). Since BridgeDb provides all its code open source [https://www.bridgedb.org/development/], and we actively seek out collaboration with others (such as identifiers.org and the mapping providers) we believe we have a strong position in the bioinformatics field.

BridgeDb builds on standard unique, persistent and resolvable identifier schemas, like the MIRIAM standard supported by identifiers.org. It combines mappings provided by various curated resources including several ELIXIR core data resources. As an online resource DAVID from the former TIGR institute became best known but that lacked updates for a period of 8 years.

c. Resource url

https://bridgedb.org/ or https://bridgedb.github.io/

d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)

The BridgeDb software framework has already been incorporated in several bioinformatics applications, such as PathVisio[https://doi.org/10.1371/journal.pcbi.1004085, https://www.pathvisio.org/], WikiPathways[https://doi.org/10.1093/nar/gkx1064, https://www.wikipathways.org/index.php/WikiPathways], Cytoscape [https://doi.org/10.12688/f1000research.4521.1, http://www.cytoscape.org/], R/Bioconductor and Open PHACTS [https://doi.org/10.1016/j.drudis.2012.05.016 , https://www.openphacts.org/]. Besides these collaborations, we also engage with other databases and offer to enable mapping to their identifier system (given that the database is registered in MIRIAM, is sustainable, trustworthy and is able to redistribute the content of their mappings under a CC0 licence). For metabolite identifiers, this has recently lead to mappings to additional databases such as KNApSAcK, LIPID Maps, and the EPA CompTox Dashboard [https://doi.org/10.1093/nar/gkx1064]. Downstream use of the BridgeDb identifier mappings include all use of WikiPathways downloadable databases (multiple species and biological entity types), via identifiers normalized using BridgeDb in pathway content (export formats BioPAX, RDF, GMT, etc).

ELIXIR (Core) Resources that are inter-operable with BridgeDb include ChEBI, Ensembl, UniProt and various repositories like ArrayExpress and MetaboLights. This allows,for example, combination of data using different identifiers, including cross-omics applications, and integration of WikiPathways in those resources, by linking to them and by using pathways in these resources. For example, MetaboLights uses ChEBI identifiers to link metabolomics data to WikiPathways. The most current funders are mentioned at 4c.

2. Community

a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)

The BridgeDb webservice has an average of 17.000 requests/day (more than 40% of which come from ELIXIR countries); the USA account for 33% of the requests. The webservice documentation has an average 80 visits/day.
The Open PHACTS website has an additional 3.5 million hits monthly, most of which use the BridgeDb Identifier Mapping Service. The BridgeDb R package is available from Bioconductor and downloaded on average 150 times per month.
The various articles related to BridgeDb, include the main BridgeDb paper (cited 104 times), the scientific lenses paper (cited 5 times), the Open PHACTS platform of which BridgeDb is central component many times, with the overview paper cited 233 times, and the BridgeDb Cytoscape app (cited 13 times) (citations count from google Scholar).

The identifier mapping databases are downloaded frequently, and essential to any pathway analysis with WikiPathways (cited a few hundred times overall). The size of the gene/protein mapping files is quite stable, with 407,5 MB in Dec. 2016 to 429,9 MB for Dec. 2017. The metabolites mapping file increased quite well over time, from 288 MB on average in 2016, to 660 MB in 2018. The BridgeDbR package is downloaded on average about 150 times per month.

b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.

We believe any database, particularly the ELIXIR core resources, should factor out identifier mapping for their core databases and reuse a community-curated identifier mapping service (“FAIR mappings”). An example of use in data integration is the “federated” querying of multiple databases (various SPARQL endpoints, or the international nanopublication network) where different resources can be expected to use different identifiers and a user wants to query all equivalent identifiers. Mappings provide the actual link between linkable data and we expect uptake of Linked Data approaches to become much faster and successful once such mappings are built in.

A friendly user interface to BridgeDb, focused on non Bio-IT experts and comparable to DAVID, is being developed. This will allow automatic ID detection, if possible, and conversion from different ID-types to a unified type. File import/export will also be supported. We tentatively gave this the name GOLIATH.

c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).

Scientific guidance is available via the already described journal articles under part 2a. A paper on the BridgeDb gene-variant - gene database was recently submitted to the F1000 ELIXIR channel (awaits ELIXIR agreement) BridgeDb training material was developed as part of systems biology education at Maastricht University where it was used for many years now. We gave external courses upon request (we recently provided a workshop regarding identifier mapping at the Metabolomics Winterschool (organised by DeNBI, https://www.denbi.de/) to use BridgeDb as a mapping resource) and it is included in various postgraduate courses we give on integrative systems biology (e.g. the 2018 NuGO nutrigenomics Winterschool in Lapland). We will use the “Helis Academy” Interreg project to develop further training material and courses in collaboration with ELIXIR training platform and we started to provide training material through TeSS.

General guidance is available via online (open access) material, such as the BridgeDbR Vignette: ttps://bioconductor.org/packages/release/bioc/vignettes/BridgeDbR/inst/doc/tutorial.html. Resources are listed in TeSS: https://tess.elixir-europe.org/search?utf8=%E2%9C%93&q=BridgeDb.

d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.

Various tools depend on Bridgedb for identifier mapping, which includes Cytoscape, WikiPathways, PathVisio, CyTargetLinker, and Open PHACTS. If BridgeDB would not exist, there is no framework so flexible to the needs of these tools, so active in collaboration with other databases (as mentioned under 1d), and with such an open system in terms of code, but also provenance. We see BridgeDb as the answer to several problems related to identifier mapping. Furthermore, BridgeDb keeps being developed to the need that users have (see part 2a and b), has an issue tracker on github for all its code, allowing the participation of everyone that is willing to.

3. Quality of resource

a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.

Our hosted BridgeDb webservice has an uptime of 99.89%, and is being monitored every 15 minutes for fast response time in case of issues (rare overloading use). Identifier mapping database Release cycles are for gene/protein ID mappings every 3 months (following Ensembl releases), for metabolite ID mappings every month (following ChEBI releases), and for other databases an irregular schedule based on user needs.

Software Release updates for BridgeDb happen when the need is identified: the platform has been running stable for some time. The code repository can be found at https://github.com/bridgedb/BridgeDb. There are multiple (a handful) of developers at different institutes with thorough knowledge of the code.

b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.

The public web service (http://webservice.bridgedb.org/) is documented with OpenAPI (formerly Swagger), which not only enables human readable documentation and API testing, but also automatic Client SDKs generation for over 20 programming languages. It is registered with bio.tools at https://bio.tools/bridgedb. Both the default (simple) identifier and IRI-based mapping services can also be installed locally using Docker images available from https://hub.docker.com/r/bigcatum/bridgedb/. The identifier mapping databases (Derby, IMS link sets) are available from http://www.bridgedb.org/mapping-databases/. For the IMS linksets, VOID headers describe the provenance.

c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).

BridgeDb is an Open Science project and maintenance is split up into a various aspects, with several people taking responsibility for different parts (the software, certain identifier mapping databases, the webservice). For all aspects, documentation is available allowing others to take over, if needed. Work is ongoing to automate many aspects as much as possible using Continuous Integration systems (e.g. see https://jenkins.bigcat.unimaas.nl/). The project has used modern development methods, as outlined by the guidelines for sustainable software development.

The project is committed to keeping BridgeDb alive, with both extrinsic motivation (BridgeDb is an essential component to international projects), as well as intrinsic motivation (BridgeDb is part of local funded research projects). This is demonstrated by an 8 year track record with changing project funding over time shows that that actually works.

d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.

Since all the code is available (https://www.bridgedb.org/development/) with accompanying documentation, we prefer that issues are reported on Github, for the specific module it is applicable for. There is also a google mailing list (https://groups.google.com/forum/#!forum/bridgedb-discuss), which is a more familiar format for users not known to Github. Furthermore, the contact information for each author of the mapping databases is given at their specific webpage. The resource response time can be improved tremendously when a caching server is used for duplicated requests; we are working on improving the architecture of BridgeDB to improve speed on parallel requests. For the R-package (which is part of the Bioconductor Library), the R package code and documentation with a tutorial is available at http://bioconductor.org/packages/devel/bioc/html/BridgeDbR.html. Besides this general tutorial, there is also a specific tutorial available on how to use BridgeDb to perform pathway analysis on metabolomics data, at https://egonw.github.io/metawinterschool-bigcat/. ELIXIR TeSS has a few resources registered: https://tess.elixir-europe.org/search?utf8=%E2%9C%93&q=BridgeDb.

4. Legal framework, funding, and governance

a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.

BridgeDb is a 100% Open Science project. The software is available under an Apache License 2.0 and identifier mapping databases under Open licenses, CCZero where possible (e.g. for the metabolite identifier mapping databases). Further details are available from the BridgeDb website (http://bridgedb.org/).

b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.

The BridgeDb software and data does not involve private data and services can be used without needing a user account, and we believe that this point is not applicable for an identifier mapping tool. The webservice, however, collects IP addresses to monitor web service activity to maintenance and usage reporting. A privacy policy must be developed for this.

Of course, the development model uses online collaboration platforms, requiring developers to have an account on such a system, like GitHub, where these rules apply.

c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.

BridgeDb is not directly funded, but has been funded over the years as part of many projects, including: the Open PHACTS project, OpenRiskNet, EU-ToxRisk, NanoCommons, ELIXIR Implementation Studies MolData1 and MolData2 (2016 and 2017) and Metabolomics (2018) and now FAIRplus. The European Joint Project about Rare Diseases will contribute to linking protein 3D structure and function to various pathogenic and benign genetic variants, thus further extending the BridgeDb functionality. The diversity of funding significantly improves the sustainability over having a single source of funding that runs out at the end of a project.

Sustainability is further ensured by the Open Science nature and the continued dependence of other projects on BridgeDb. Furthermore, people involved continuously push a standardized, FAIR identifier mapping database in other data integration projects, encouraging broader sustainability.

d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.

BridgeDb is steered by the main projects using it, which includes WikiPathways, Cytoscape and Open PHACTS. Like most Open Source projects, it also takes input from other users. BridgeDb will be an ELIXIR-NL service, and as such BridgeDb will be subject to regular review by the ELIXIR-NL strategy board, who, as described in the ELIXIR-NL Service Delivery Plan, will work with the BridgeDb team to optimize 5 criteria: Openness, FAIRness, Quality, Fit (to the node goals) and Plan (for the future). We occasionally receive input from other SABs or project staff from projects BridgeDb is used in (e.g. Open PHACTS, Cytoscape).