DisGeNET

Notes given in the application form

Eligibility criteria

  • Must be an ELIXIR Service (i.e. be part of an existing ELIXIR Node’s Service Delivery Plan, or is ELIXIR commissioned work), or is in the official process/commitment of becoming one. (Required)
  • Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
  • Must support or be forecast to support FAIR Principles. (Required)
  • Should fit into, or be forecast to fit into, the EIP roadmap for data interoperability or other activities relevant to ELIXIR mission.

Additional notes

  • Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
  • Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
  • Word limit guidance is noted for free text fields.
  • Please include urls to external resources, where useful.
  • Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).

1. Resource facilitation to scientific research

a. Interoperability Resource: Briefly describe the function of the Interoperability Resource

DisGeNET is one of the largest available collections of genes and variants involved in human diseases. It integrates data from various repositories and the scientific literature. The current release of DisGeNET (v5.0) contains 561,119 gene-disease associations (GDAs), between 17,074 genes and 20,370 diseases and traits, and 135,588 variant-disease associations (VDAs), between 83,002 variants and 9,169 diseases and phenotypes. DisGeNET was evaluated by an external panel as part of the renewal of the ELIXIR-ES Service Delivery Plan (SDP). The ELIXIR-ES SDP will be introduced once the legal framework is renovated and approved by the ELIXIR Board.

b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources ? Include the base URL and, if relevant, the introductory or “about” page URL.

DisGeNET provides standardized and Open Linked Data on gene-disease and variant-disease associations. Although there are several other resources that offer gene and/or variants associated to diseases, the advantages of DisGeNET are: i) is open, ii) is standardized using community-based standards, iii) is linked to other resources, iv) can be accessed through a variety of tools. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies (over 15 biomedical ontologies are used), to support linkage and integration with other resources. DisGeNET is part of the Linked Open Data cloud (LOD) through its RDF implementation (https://lod-cloud.net/dataset/disgenet). Resources are identified by using de-referenceable IRIs. Gene-disease and variant-disease associations are integrated using the DisGeNET Association Type Ontology and semantically harmonized using SIO (http://sio.semanticscience.org) classes. More than 5 million linksets to the LOD are available connecting DisGeNET with UniProt, PubChem, DBpedia, Diseasome, Bioportal, Bio2RDF and Linked Life Data. DisGeNET appears in the last release of the LOD cloud diagram (2018-06-28 update). 

URL: http://www.disgenet.org/
About URL: http://www.disgenet.org/web/DisGeNET/menu/dbinfo

c. Resource url

http://www.disgenet.org/

d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)

It has community recognition due to the number of resources that rely on DisGeNET to provide information on diseases and their associated genes and variants (see section 2.d), the number of publications that cite it (see section 2.a), and the number of users around the world (see section 2.a). DisGeNET development has been funded in a dedicated manner by the IMI project OpenPHACTs, the H2020 projects MedBioinformatics and Elixir-Excellerate, and the national project INB-PRB2. 
It is developed by a single group, thus there are no delivery partners or submitters.

2. Community

a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)

Publication citations (by Google Scholar on July 6 2018):
Piñero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research. 2017 Oct 19:gkw943. 130 citations 
Piñero J, et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database. 2015 Jan 1;2015. 261 citations 
Bauer-Mehren A, et al. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene–disease networks. Bioinformatics. 2010 Sep 21;26(22):2924-6. 125 citations 
Queralt-Rosinach N, et al. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases. Bioinformatics. 2016 Mar 22;32(14):2236-8. 17 citations 
Queralt-Rosinach N, et al. Publishing DisGeNET as nanopublications. Semantic Web. 2016 Jan 1;7(5):519-28. 11 citations 
Bauer-Mehren A, et al. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PloS one. 2011 Jun 14;6(6):e20284. 141 citations 

Over 20 resources around the world rely on DisGeNET, among them Elixir Core Resources. A link to DisGeNET has been incorporated in 2016 to the human UniProt entries, to provide information on diseases. See for example COMT: https://www.uniprot.org/uniprot/P21964#pathology_and_biotech

Europe PMC incorporates annotations on gene-disease associations, currently containing 20,700 article citations with annotations provided by DisGeNET (http://europepmc.org/search?query=%28LABS_PUBS:%221672%22%29&page=1).

Data use statistics (June 2017-June 2018):
• DisGeNET web: 22,200 users from over 100 countries
• disgenet2r (R package): 589 users 
• RDF usage (SPARQL and faceted browser): 972 users 
• Nanopublication server: 4084 users
• Data downloads (different subsets of the data): 2893 users 
• Cytoscape app: 2200 users 

b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.

Other Elixir core resources that could use DisGeNET are: Intact and Mint (a pilot annotation of variants that disrupt PPIs has been performed, https://doi.org/10.1101/346833), PRIDE, PDBe, Human Protein Atlas, CheMBL, Brenda. All these resources could provide annotations of genes, proteins or variants to diseases from DisGeNET. In addition to Elixir resources, databases such as Genome Aggregation Database (http://gnomad.broadinstitute.org/) would benefit from integrating disease annotations from DisGeNET. At the national level, we are exploring a collaboration with the URDCat platform, a personalized medicine platform focused in rare neurological diseases.

c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).

Publications 
Piñero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research. 2016 Oct 19:gkw943. 130 citations 
Piñero J, et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database. 2015 Jan 1;2015. 261 citations 
Bauer-Mehren A, et al. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene–disease networks. Bioinformatics. 2010 Sep 21;26(22):2924-6. 125 citations 
Queralt-Rosinach N, et al. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases. Bioinformatics. 2016 Mar 22;32(14):2236-8. 17 citations 
Queralt-Rosinach N, et al. Publishing DisGeNET as nanopublications. Semantic Web. 2016 Jan 1;7(5):519-28. 11 citations 
Bauer-Mehren A, et al. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PloS one. 2011 Jun 14;6(6):e20284. 141 citations 

Documentation
DisGeNET Web Interface tutorial: http://www.disgenet.org/ds/DisGeNET/files/DisGeNET_Web_2016.pdf
DisGeNET Cytoscape App user gide: http://www.disgenet.org/ds/DisGeNET/files/DisGeNET_Cytoscape_v5.pdf
DisGeNET RDF tutorial: http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql 

DisGeNET presentations
-FEBS 2018 Advanced course: The molecular basis of disease
-Translational Bioinformatics Conference (2017, 2018)
-Network Biology/Integromics Bioinformatics - Applications towards Medicine, Norway 2017
-ECCB 2017 
-DisGeNET tutorial at the ECCB 2016 in The Hague, Netherlands (September, 2016)
-JBI in Valencia, Spain (May, 2016)
-Linking Life Science Data: Design to Implementation, and Beyond, Vienna, -Austria (February, 2016)
-BioHackathon 2015 in Nagasaki, Japan (September, 2015)
-SWAT4LS 2015 in Cambridge, UK (December, 2015)
-Big Data in Biomedicine debate. Barcelona, Spain (November, 2014)

d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.

The following resources depend on DisGeNET to provide gene and variant-disease annotations:
1. UniProt https://www.uniprot.org/
2. Europe PMC https://europepmc.org/AnnotationsApi
3. Pharos https://pharos.nih.gov/idg/help
4. NDEX public server http://www.ndexbio.org
5. Nextprot https://www.nextprot.org/
6. Gene Organizer http://geneorganizer.huji.ac.il/ 
7. TCGA http://tcga.ir/index.php
8. DiseaseCellTypes Rpackage https://github.com/alexjcornish/DiseaseCellTypes
9. DOSE Rpackage https://bioconductor.org/packages/release/bioc/html/DOSE.html
10. WebGestalt http://www.webgestalt.org/option.php
11. TcoF-DB v2 http://compbio.massey.ac.nz/apps/tcof/browse/?type=anno&atype=Diseases
12. ToppCluster https://toppgene.cchmc.org/navigation/database.jsp
13. RITAN R package http://bioconductor.org/packages/release/bioc/vignettes/RITAN/inst/doc/…
14. HEDD human enhancer disease database http://zdzlab.einstein.yu.edu/1/hedd.php
15. BioCarian http://www.biocarian.com/
16. DisNOR http://disnor.uniroma2.it/
17. Tabloid Proteome http://iomics.ugent.be/tabloidproteome
18. Personal Cancer Genome Reporter (PCGR) https://github.com/sigven/pcgr
19. ERAM: encyclopedia of Rare Disease Annotation for Precision Medicine http://www.unimd.org/eram/ 
20. Target Mine http://targetmine.mizuguchilab.org/targetmine/dataCategories.do
21. OpenPhacts discovery platform http://www.openphactsfoundation.org/platform/
22. lnc2atlas https://lnc2catlas.bioinfotech.org/home/

3. Quality of resource

a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.

The database is updated in a yearly basis, as well as the software tools. Small improvements/bugs are fixed in a continuous manner. A summary of the version history is provided:
-DisGeNET app version v5.1 - May, 2018
-DisGeNET-RDF 5.0 – November 19, 2017 
-DisGeNET 5.0 - May 28, 2017 
All data sources updated, new data sources added (PsyGeNET, HPO), score to rank VDAs and Specificity and Pleiotropy indexes for variants
-DisGeNET 4.0 - June, 2016 
New entry point in the web interface for variants, new data available for variants (consequence type, allele frequencies, etc. )
-DisGeNET 4.0 - April 15, 2016 
All data sources updated, new ones added (Orphanet, GWAS Catalog), new association types added to the DisGeNET ontology, SPI and DPI indexes for genes, SNP-gene and SNP-disease association available
-DisGeNET 3.0 - May 15, 2015 
All data sources updated, new ones added (ClinVar), improved text mining pipeline
-DisGeNET 2.1 - May 5, 2014 
Second release of DisGeNET-RDF, new text mining information using the BeFree System
-DisGeNET 2.0 - February 5, 2014 
First release of DisGeNET-RDF, added information about rat disease models from CTDTM and RGD, new text mining information
-DisGeNET - July 20, 2012 
Added information about mouse disease models from CTDTM and MGD, changed disease identifiers from MeSH, OMIM® to UMLS® CUIs, DisGeNET web interface is launched
-DisGeNET 1.0 - Sep 21st 2010 
Release of DisGeNET as a Cytoscape plugin and SQLite database

Response time of the resource (Web interface) can be found here: https://openebench.bsc.es/html/#!/tool/disgenet

b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.

The resource is accesible through a web interface, RDF sparql endpoint, a faceted browser, a nanopublication server, a Cytoscape App and an R package. Scripts in different languages are also provided to access data. Selected data dumps and the entire database are available for downloads. 

Web: http://www.disgenet.org/
Cytoscape App: https://apps.cytoscape.org/apps/disgenetapp 
RDF SPARQL endpoint: http://rdf.disgenet.org/sparql/ 
RDF Faceted browser: http://rdf.disgenet.org/fct/ 
DisGeNET nanopublication server: http://rdf.disgenet.org/nanopub-server/ 
disgenet2r R package: https://bitbucket.org/ibi_group/disgenet2r 

c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).

The resource is maintained through competitive grants, which has been the way to support its development from its first release in 2010. Different funding opportunities are constantly explored to sustain its development.

d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.

We provide support via a help desk (support@disgenet.org). Typically responses are provided within the day, as well as solutions to the requests. Three persons are in charge of providing support at the help desk (one for the RDF and the other two for the other tools and general questions). Currently 4 developers and one scientific coordinator are behind the project.

Tutorial documentation:
DisGeNET Web Interface tutorial: http://www.disgenet.org/ds/DisGeNET/files/DisGeNET_Web_2016.pdf
DisGeNET Cytoscape App user gide: http://www.disgenet.org/ds/DisGeNET/files/DisGeNET_Cytoscape_v5.pdf
DisGeNET RDF tutorial: http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql 

4. Legal framework, funding, and governance

a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.

The DisGeNET database is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License. The DisGeNET Cytoscape App is distributed under the GNU GPL 3.0 license. More information can be found here: http://www.disgenet.org/ds/DisGeNET/html/legal.html
Due to its public availability through these licenses, and especially through its availability as an RDF resource, it supports open science.

b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.

No personal data is present in the database.

c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.

Current projects used to support the resource are: IMI-JU under grants agreements no. no. 115735 (iPiE), no. 116030 (TransQST), no. 777365 (eTRANSAFE), resources of which are composed of financial contribution from the EU-FP7 (FP7/2007- 2013) and EFPIA companies in kind contribution, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. 
In addition to these competitive grants, we are exploring collaboration with private companies to support the development of the resource.

d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.

We plan to constitute a SAB in the near future.