MOLGENIS

Notes given in the application form

Eligibility criteria

  • Must be an ELIXIR Service (i.e. be part of an existing ELIXIR Node’s Service Delivery Plan, or is ELIXIR commissioned work), or is in the official process/commitment of becoming one. (Required)
  • Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
  • Must support or be forecast to support FAIR Principles. (Required)
  • Should fit into, or be forecast to fit into, the EIP roadmap for data interoperability or other activities relevant to ELIXIR mission.

Additional notes

  • Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
  • Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
  • Word limit guidance is noted for free text fields.
  • Please include urls to external resources, where useful.
  • Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).

1. Resource facilitation to scientific research

a. Interoperability Resource: Briefly describe the function of the Interoperability Resource

MOLGENIS allows researchers to process/harmonize (‘FAIRify’), analyze and publish data. Its data model is configurable to different use cases, e.g.: findability of biobanks (catalogues, directories); rare disease patient registries; DNA variant databasing; publishing of phenome/multi-omics studies. MOLGENIS’ mapping tools help ontologize, harmonize, restructure and integrate datasets. Furthermore, MOLGENIS has scripting (R, javascript, Python) and customization (‘apps’, report template) features to enable analysis, and a HPC pipeline. MOLGENIS was the first datasharing platform extended with a FAIR DataPoint. MOLGENIS has a combination of features, partly overlapping with Galaxy, OpenClinica, TranSMART, Wordpress, Microsoft Access, Intermine, Rightfield, and Google forms.

b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources ? Include the base URL and, if relevant, the introductory or “about” page URL.

MOLGENIS is a versatile online platform that can be adopted to many different use cases. It is widely used in the field of biobanking to increase the findability of biobanks through online catalogues or directories of biobanks (e.g. BBMRI-ERIC, RD-Connect) as well as for rare disease patient registries (e.g. col7a1) and variant databases (e.g. VKGL). MOLGENIS’ unique strength is that it combines flexible data modelling services with tools for data integration and visualisation and it comes with support for scripting in languages such as R, Python or JavaScript. MOLGENIS is available from https://www.molgenis.org/. 

MOLGENIS is open source (in contrast to e.g. Microsoft Access) and free. MOLGENIS is flexible in its structure and the components that can be used. For example it can be used as a database only, but also as catalogue, and in addition it has tools for analysis and visualisation of genetical data and data management tools as mentioned above. In addition MOLGENIS compute is designed for bioinformaticians and allows large-scale data and computational workflow management in a distributed execution environment. This combination -and the flexible use it allows- is a distinguishing feature of MOLGENIS (in contrast to e.g. Google forms or Microsoft Excel). 

In summary, MOLGENIS is solid database system with flexible apps.

About MOLGENIS page:
https://molgenis.gitbooks.io/molgenis/content/

c. Resource url

http://molgenis.github.io/

d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)

MOLGENIS is used in mostly international consortia:

Examples of collaborations using MOLGENIS: Catalogues for findability: BBMRI-ERIC biobank directory (1500+; https://directory.bbmri-eric.eu/); BBMRI-NL collection catalogue (250+); Lifelines data catalogue (167k samples, >1000 data items; https://catalogue.lifelines.nl/); PALGA public database of samples, (>60 million diagnoses; >11 million patients); RD-connect sample catalogue (https://samples.rd-connect.eu/) . Portals for data access and reuse: 500 Functional Genomics (RNA seq of >6000 individuals) (Netea et al, Nature Medicine 2016); 1000 IBD biobank (samples, clinical data, gwas, multi-omics) (Spekhorst et al, BMJ open) ; the BBMRI-NL Genome of the Netherlands project (GoNL consortium, Nature Genetics, 2014); WormQTL and WormQTL-hd for C. elegans multi-omics. Patient registries: DEB-Central (https://www.deb-central.org/), a registry for dystrophic epidermolysis bullosa (DEB) patients and variants in COL7A1 gene; CHD7 Database (http://chd7.org/): charge syndrome; Microvillus Inclusion Disease Patient Registry (http://www.mvid-central.org/); AIP Mutation Database (https://aip.fipapatients.org). Apps for analysis and collaboration: VKGL pathogenicity variant consensus builder; RNA allele specific expression browser; E-Rare systemic inflammation consensus builder.

Examples of current and previous funders of MOLGENIS: Netherlands Bioinformatics Center/DTL (ELIXIR-NL); BBMRI-NL (biobank catalogue; multi-omics data integration); EU-CORBEL (interoperability; access to sensitive data); EU-RD-connect (rare disease catalogue); BBMRI-ERIC (EU biobank catalogue); EU-BioMedBridges (interoperability; first model registry); EU-Solve-RD (rare disease analysis tools); E-Rare NSAID project; EU-GEN2PHEN (common/rare disease); EU-Panacea (c. elegans multi-omics); Target/LifeLines; NL-CTMM/TraIT (translational medicine); EU-BioSHaRE (federated analysis of obesity cohorts) EU-LifeCycle (federated analysis of birth cohorts); NL- X-omics (multi-omics); NL-NEMI (microscopy).

2. Community

a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)

Li et al. A Functional Genomics Approach to Understand Variation in Cytokine Production in Humans. Cell 2016 (https://hfgp.bbmri.nl)

Spekhorst et al. Cohort profile: design and first results of the Dutch IBD Biobank. BMJ Open 2017. (https://1000ibd.org/)

Van Gijn et al, New workflow for classification of genetic variants … by the International Study Group for Systemic Autoinflammatory Diseases (INSAID). J Med Genet. 2018. (https://molgenis.org/said)

Holub P et al. BBMRI-ERIC Directory: 515 Biobanks with Over 60 Million Biological Samples. Biopreserv Biobank. 2016 (https://directory.bbmri-eric.eu)

van der Velde KJ et al. WormQTLHD--a web database…... Nucleic Acids Res. 2014 (https://wormqtl.org and https://wormqt-hd.org)

Van der Velde et al. An overview and online registry of microvillus inclusion disease patients .... Human Mutation 2013 (http://mvid-central.org)

Sernadela et al. Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer. Biomed Res Int. 2017:8327980. 

Bernstein MN et al. MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 2017: 33(18).

Lochmuller et al. RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases. Eur. J. of Human Genetics 2018. 
(https://samples.rd-connect.eu)

Galesloot et al. Cohort Profile: The Nijmegen Biomedical Study (NBS). International Journal of Epidemiology 2017.

Leu et al. NordicDB: a Nordic pool and portal for genome-wide control data. Eur. J of Human Genetics 2010. 

Engwerda et al. The phenotypic spectrum of proximal 6q deletions.... Eur J Hum Genet. 2018 ( https://www.chromosome6.org/)

b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.

MOLGENIS makes data interoperable and supports creating solid meta-data of data. MOLGENIS efficiently gets data properly structured, enabling increased interoperability. Potential users are all research projects following the Open Science principles of transparency and data re-use. MOLGENIS enables these groups to make data FAIR and offers safe and efficient workflows for data access and re-use. Also (inter)national organisations or collaborations, can use MOLGENIS to make their data findable and accessible. 

MOLGENIS is well known in the “multi-omics” and genetics field. Currently we expand towards the “non-omics community” and broader clinical domains such as (microscopical) imaging. Other potential users are epidemiologists and social scientists where a lot of questionnaire-based science is performed. MOLGENIS can be configured so that questionnaire answers are entered automatically in the database. 

MOLGENIS is an easy platform for beacons, bioschemas, other catalogues and other FAIRifiers. Application of Molgenis is irrespectively of data type and nature.

c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).

While we still grow in funding, we should do better in outreach to maximize impact. E.g. we currently don’t have dedicated user meetings and have hired a community manager (we hope ELIXIR can help here). We currently participate in consortium meetings, conferences, publications and workshops. We invest in publications and documentation. 

MOLGENIS documentation: https://molgenis.gitbook.io/

MOLGENIS is described in >20 publications including: 

Swertz et al. Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet. 2007 

Swertz et al. The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. BMC Bioinformatics. 2010; 

Pang et al. BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration. Bioinformatics. 2017; 

Pang et al. MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks. Bioinformatics. 2016; 

Pang et al. SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data. Database (Oxford). 2015; 

Pang et al. BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing. J Am Med Inform Assoc. 2015.

Presentations/demonstrations:

Fair Datapoint in MOLGENIS. Health Research Infrastructure (HealtRI) annual meeting 2017, Netherlands. 

Apps and portals to promote (re)use of biobanking and health data. HealthRI annual meeting 2017. 

Webinar: RD-Connect Sample Catalogue:
https://www.youtube.com/watch?v=j6rAjBmeh9s

Workshop core variable catalogue for LifeCycle, Lifecycle meeting, 2018, Oulu, Finland.

Poster: Bioschema implementation in MOLGENIS catalogues: 
-ELIXIR All Hands Meeting 2018 
-annual meeting Netherlands Bioinformatics and Systems Biology research school, 2018.

MOLGENIS youtube channel: 
https://www.youtube.com/results?search_query=molgenis

d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.

In general most MOLGENIS users are service providers (e.g. making a catalogue available for researchers) using their MOLGENIS installation to provide their service to clients.

MOLGENIS is an integral part of the BBMRI-ERIC CS-IT infrastructure. The Negotiator service is depending on MOLGENIS for the Findability of the biobank resources through the Directory.

MOLGENIS is an integral part of the RD-Connect infrastructure. The Registry & Biobank Finder gathers sample statistics for the biobanks from the RD-Connect sample catalogue.

MOLGENIS is an integral part of the BBMRI-NL infrastructure. It is the starting point for requests for samples or data.

In analogy to this, MOLGENIS is an integral part of the other catalogues and databases mentioned before

MOLGENIS is part of long term infrastructure of University of Groningen (research IT and personalized health program).

MOLGENIS is an integral part of the 1000 IBD infrastructure, (samples, clinical data, gwas, multi-omics analysis of inflammatory bowel disease patients).

MOLGENIS Compute is instrumental for building bioinformatics pipelines for HPC analysis of genetics and omics data and is used for clinical diagnostics and patient care sequencing analysis in the department of Genetics, UMCG.

For more examples of projects/constoria that depend on MOLGENIS see also section 1b: “Inter-organisational recognition” .

3. Quality of resource

a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.

MOLGENIS is open source and can be found on GitHub. (https://github.com/molgenis/molgenis).

Average uptime of MOLGENIS: 98,5% This number includes a planned maintenance interval of 15 minutes/week.

We have:
-72 production servers
-our hardcore IT hosted by University of Groningen Center for Information Technology, second largest in NL.
-three-weekly sprints using the Scrum methodology (https://www.scrumguides.org). 

MOLGENIS contains several (ontology-based) tools (BiobankConnect, SORTA and MOLGENIS/connect; for references see section 2c), that are specifically designed to find the required data items in different datasets/biobanks and to transform them semi-automatically in a standard data models suited for integrated analysis. These tools are flexible.

b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.

MOLGENIS releases multiple releases per year to maven repository for computer discovery:
https://mvnrepository.com/artifact/org.molgenis/molgenis-app

Downloads/release notes: 
https://github.com/molgenis/molgenis/releases.

MOLGENIS deployment documentation: 
https://molgenis.gitbooks.io/molgenis/content/

We publish a docker-compose repository for quick MOLGENIS try outs:
https://mvnrepository.com/artifact/org.molgenis/molgenis-app

MOLGENIS has a rich web-based user interface where users can do everything.

Data is also exposed through REST API.

Clients for the REST API in R, Python, JavaScript, and Java and a download tool for exports (csv, excel, rdf formats).

For several catalogues/datasets Bioschema’s in MOLGENIS aid findability.

Users can: 
- configure a FAIR Data Port to share their metadata.
- publish genomic datasets using a Beacon.

c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).

MOLGENIS is developed in a open and agile process continuously since 2004. We have adopted scrum and are now in sprint 124 (3-weekly sprints). The team has continuously grown from 2 developers in 2007 and the UMCG team currently has 9 developers and 4 data managers and 3 project-/team managers involved and we welcome third party pull requests. 

Issues and external feature requests are handled in the Github Issue tracker for normal defects and through the UMCG internal issue tracker for security issues and ‘stories’. Adaptations and solutions we build on request for our customers will also benefit the stability of MOLGENIS for other customers.
We strive for programming excellence and maintain a developer guideline. 

The project uses a combination of manual and automated tests (unit testing, API testing, integration testing and static code analysis) and a continuous integration service to ensure high quality software. MOLGENIS has a test suite of 3500 cases we run for each commit. All releases are versioned using semantic versioning. We include automated code quality checks such as Sonar.

Deployable Docker images are available from our Github page and pre-built and signed WAR files are downloadable from MAVEN. 

Installation and configuration instructions are part of the manual. 

We aim for self-service level to our end-users as well as admins and contributing developers.

d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.

General MOLGENIS support is available for free from UMCG (Groningen) via different mail-addresses, depending on the users needs. We have 4 datamanagers providing support. CTMM/TraIT (SURF Amsterdam) runs a first line helpdesk for MOLGENIS users. If required additional second line support is provided by the MOLGENIS team from the UMCG. Support on MOLGENIS is also available in a limited form via BBMRI-ERIC (Graz/Milano) and RD-Connect (CNAG Barcelona). MOLGENIS can, and is, also run locally.

For a small fee, depending on the question/service asked, every user can obtain additional support. We are able to support users with general advice on MOLGENIS use, data modelling, uploading of new datasets, harmonizing of datasets, and building ontologies.

Support or training material can be found on the MOLGENIS wiki pages:

https://molgenis.gitbooks.io/molgenis/content/

http://www.molgenis.org/wiki/WikiStart

The MOLGENIS YouTube channel containing some instructional video’s:
https://www.youtube.com/channel/UCiVR-YZFcBQe0i6RUwE9kyg

4. Legal framework, funding, and governance

a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.

MOLGENIS team is committed to open science. MOLGENIS software code is freely available under the GNU Lesser General Public License version 3 LGPLv3 License. Documentation source code released under the MIT License, documentation contents under CC BY 3.0.

https://www.gnu.org/licenses/lgpl-3.0.en.html

MOLGENIS can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software-as-a-Service solution. A public demo instance and complete installation instructions can be found at http://molgenis.org/research.

b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.

MOLGENIS team is hosted at GCC of UMCG and is ISO/IEC 27001:2013 certified. 

We are currently setting up federated AAI (ELIXIR, BBMRI, SURF) with 2-factor authentication, as well as adapting to the current GDPR regulations. 

We use Google Analytics to review the usage of our websites and improve our services. To optimally protect the privacy of our visitors we have signed the Data Processing Amendment, masked parts of their IP address and disabled data sharing with other Google services. Visitors have the possibility to opt-out of Google Analytics.

c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.

MOLGENIS is long-term supported by Genomics Coordination Center (since 2009). Currently we have 9 scientific programmers, 4 data managers/support staff, 3 team/project leads, 1 community manager, 5 bioinformatics analysts and 4 PhD students actively supporting MOLGENIS code base. 

In addition we receive input and contributions from the community. Development of MOLGENIS software and innovation is driven by projects (and funding) from the community. 

With every feature we aim to make it meet the needs of a range of MOLGENIS users (otherwise the feature will die when project ends). In that sense MOLGENIS functions as a cooperative. In addition we deliver ‘software as a service’ where researchers can pay to get long-term hosting and support beyond end of their projects.

As also written under 1b: Examples of current and previous funders of MOLGENIS: Netherlands Bioinformatics Center/DTL (ELIXIR-NL); BBMRI-NL (biobank catalogue; multi-omics data integration); EU-CORBEL (interoperability; access to sensitive data); EU-RD-connect (rare disease catalogue); BBMRI-ERIC (EU biobank catalogue); EU-BioMedBridges (interoperability; first model registry); EU-Solve-RD (rare disease analysis tools); E-Rare NSAID project; EU-GEN2PHEN (common/rare disease); EU-Panacea (c. elegans multi-omics); Target/LifeLines; NL-CTMM/TraIT (translational medicine); EU-BioSHaRE (federated analysis of obesity cohorts) EU-LifeCycle (federated analysis of birth cohorts); NL- X-omics (multi-omics); NL-NEMI (microscopy).

While we are mostly project funded, we have long-term commitments from BBMRI-ERIC/NL and ELIXIR-NL to include MOLGENIS in their national and international funding acquisition strategies, and from the University Medical Center Groningen and University of Groningen as part of their local research IT infrastructure. 

d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.

As an ELIXIR-NL service, MOLGENIS is subject to regular review by the ELIXIR-NL strategy board, who, as described in the ELIXIR-NL Service Delivery Plan, work with the MOLGENIS team to optimize 5 criteria: Openness, FAIRness, Quality, Fit (to the node goals) and Plan (for the future). 

In addition we coordinate weekly between MOLGENIS team and other ELIXIR-NL teams via DTL and there is also weekly meeting between ELIXIR-NL CTO and MOLGENIS product owner. 

We formally do not have a scientific advisory board. However several members of the MOLGENIS team in Groningen have regular one-on-one meetings with representatives from different user communities both on a national as well as international level (such as, but not limited to, ELIXIR-NL, DTL, BBMRI-ERIC). In these meetings we collect feedback on how to proceed with MOLGENIS both on a strategic level as well as for development of the software.