InterMine | ELIXIR

Notes given in the application form

Eligibility criteria

Must be an ELIXIR Service (i.e. be part of an existing ELIXIR Node’s Service Delivery Plan, or is ELIXIR commissioned work), or is in the official process/commitment of becoming one. (Required)
Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
Must support or be forecast to support FAIR Principles. (Required)
Should fit into, or be forecast to fit into, the EIP roadmap for data interoperability or other activities relevant to ELIXIR mission.

Additional notes

Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
Word limit guidance is noted for free text fields.
Please include urls to external resources, where useful.
Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).

1. Resource facilitation to scientific research

a. Interoperability Resource: Briefly describe the function of the Interoperability Resource

InterMine is a platform for integrating life sciences data. Like Galaxy, operators deploy it to their own systems to give end users access to integrated data. It provides loaders for many common data resources and an API for operators to write their own. Using its web interface, end users can search, perform queries, analyse via lists and structured “templates”, and visualize data with integrated third-party tools. Data and analysis results are accessible through the web interface or through InterMine’s RESTful webservices. InterMine also provides data access libraries written in R, Python, Perl and other languages.

b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources ? Include the base URL and, if relevant, the introductory or “about” page URL.

Introductory information is found at the base url: http://intermine.org. InterMine can integrate any selection of datasets which cross-reference common identifiers. Its life sciences object model easily can be reconfigured and extended, and so can be applied to any selection of datasets that require integration. At InterMine’s core is a data warehouse, with the strengths of ease of use, data consistency and efficient search.

Federation with other systems is enabled through RESTful web services, which allow query and download in various formats. Ongoing funded work is increasing its FAIRness through mechanisms such as RDF and Linked Data support, identifiers.org and FAIRsharing integration, Bioschemas embedding, ISA tools integration, increased Galaxy integration and cloud deployment.

There are three types of user: 1) Operators who deploy the platform and integrate it with other systems; 2) End Users who access data via the web application. Such users have a workspace where they can save and publish analyses and searches; and 3) Machines and user scripts that access data through the web services and integrate it with other systems, workflows and tools. We are assessing work to support authentication and access via ELIXIR AAI and workflow integration via the Common Workflow Language.

InterMine has a large number of existing operators and users from institutions around the world, with a concentration in model organism communities. It is a service within the ELIXIR-UK node Service Delivery Plan. We are not aware of similar current ELIXIR services that perform this type of generic data integration.

c. Resource url

http://intermine.org

d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)

We are aware of 30 public InterMine deployments worldwide, including by most of the main model organism databases (MODs) including arabidopsis, mouse, rat, budding yeast, nematode and zebrafish, which service large end-user communities. It has also been deployed in other contexts, such as TargetMine, an InterMine installation for drug discovery and IndigoMine, an InterMine instance for exploring newly sequenced microbial genomes. Please see the list and links at http://registry.intermine.org although this does not include all installations. InterMine installations are geographically diverse including Europe, Asia and North America.

Contributors, collaborators and delivery partners over various consortia and projects are also diverse, including the Joint Genome Institute, NCGR, Lawrence Berkeley National Laboratory, Stanford University, the Jackson Laboratory, the University of Oregon, the Ontario Institute for Cancer Research, the National Institutes of Biomedical Innovation in Japan, Imperial College and Newcastle University. A three-year Wellcome Trust-funded collaboration has just started with the University of Oxford to integrate the ISA Tools ELIXIR Interoperability Resource with InterMine.

Current funders are the Biotechnology and Biological Sciences Research Council (BBSRC), the Wellcome Trust and Innovate UK. InterMine development has also been funded by the Engineering and Physical Sciences Research Council (EPSRC), the U.S. National Institutes of Health and the National Science Foundation.

2. Community

a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)

As InterMine is a deployable resource (one that operators download, customise and deploy on their own systems, much like Galaxy), metrics that apply to centrally hosted services are not available to us. However, we know of many InterMine deployments around the world - please see http://registry.intermine.org/ for a list. Since registration is voluntary, this represents a subset of InterMine instances.

We can give some numbers from associated tools. The InterMine R package (http://bioconductor.org/packages/stats/bioc/InterMineR/), existing since Nov 2017, has been downloaded 801 times to date. The Bioconda InterMine Python package (https://anaconda.org/bioconda/intermine) has been downloaded more than 11,000 times.

InterMine-related publications from the Micklem lab have been cited ~400 times (assessed using PubMed). However, we note that online and software resources tend to be under-cited.

b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.

Any system that needs to perform data integration and make it available on the web and through webservices for download, query, visualisation and analysis could use InterMine. The most extensive use has been for model organism data, taking data from one or from a range of model organisms to create an integrated resource, such as the modENCODE project resource modMine (http://intermine.modencode.org). However, InterMine’s flexible data model has enabled other application areas, such as drug discovery and synthetic biology and in principle could be extended even outside biology.

To date, most InterMine installations involve large-scale data integration but we see the potential for much more widespread use in smaller-scale projects. Addressing this potential is part of our just-starting Wellcome Trust-funded collaboration with the Sansone group (University of Oxford), which will exploit cloud-based technology to allow end users to easily deploy InterMine by themselves.

c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).

InterMine has an operator manual (http://intermine.readthedocs.io/en/latest), for those that install, integrate and publish data.

A separate manual for end users (http://flymine.readthedocs.io/en/latest) details how to search for biological entities (genes, proteins, pathways, etc.), run sophisticated queries over integrated data, analyse query results and lists (for example, perform Gene Ontology or publication enrichment analysis) and download or interact with data programmatically (via scripts that InterMine can generate in Python, R and other languages).

There is also an end user tutorial at https://doi.org/10.6084/m9.figshare.4737313.v4, along with a list of video tutorials created by the InterMine community at http://intermine.org/tutorials for specific installations such as FlyMine, YeastMine and MouseMine.

All these resources are in ELIXIR Tess at https://tess.elixir-europe.org/content_providers/intermine

Journal papers describing the architectural details of the InterMine system include:

InterMine: extensive web services for modern biology. Kalderimis et al (2014) Nucleic Acids Research doi:10.1093/nar/gku301 PMCID: PMC4086141

InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Smith et al (2012) Bioinformatics. Dec 1;28(23):3163-5 and

We disseminate knowledge about InterMine through workshops (e.g. BOSC 2018), at conference presentations (e.g. Integrative Bioinformatics 2018 and SWAT4LS 2017) through training sessions (e.g. https://bioinfotraining.bio.cam.ac.uk/postgraduate/services/bioinfo-int…) and via ELIXIR programmes (e.g. the ELIXIR Industry programme).

d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.

As InterMine is a deployable platform for building integrated data resources, a number of data resources depend on it to provide services to their end user communities. For the larger resources (e.g. the larger model organism databases and also PhytoMine), InterMine installations provide their “advanced search and analysis” functionality, while for smaller resources their InterMine installation provides the core means by which end users access their data. Thus in both cases InterMine represents a key part of the infrastructure. For a list of deployments, please see http://registry.intermine.org.

3. Quality of resource

a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.

Uptime: Not applicable. InterMine is a platform provided for operation by other organisations instead of being centrally hosted by ourselves. Hence, percentage uptime and response metrics for the InterMine organisation itself aren’t useful for measuring service use, though suffice to say the Github site used to host the project has very high uptime. Uptime and response metrics would be relevant for individual installations of InterMine, but these numbers are not available to us. Qualitatively though, operators report good uptime and few issues attributable to InterMine itself. We believe the most common cause of downtime is hardware or infrastructure (e.g. network) problems beyond InterMine.

Releases: InterMine typically makes releases every month or two. There has been a pause since last September whilst its codebase is migrated to the Gradle build system as part of its continuing evolution. This process is almost complete. Please see https://github.com/intermine/intermine/releases for more information on releases.

Adaptability: As InterMine is based on a flexible data model configurable by the operator, it is highly adaptable to evolving data and requirements without needing software updates.

b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.

InterMine provides a web-based UI (e.g. http://www.humanmine.org) with results downloadable in CSV, JSON and XML, and RESTful web services accessible through Python, R, Perl, Ruby, Java and JavaScript (http://intermine.readthedocs.io/en/latest/web-services). The UI can automatically generate code for script use. InterMine can export data to Galaxy instances and ongoing work will support import from Galaxy to InterMine. InterMine is hosted at https://github.com/intermine and also avaiable from Bioconda (https://bioconda.github.io/recipes/python-intermine/README.html) and Bioconductor (https://bioconductor.org/packages/release/bioc/html/InterMineR.html).

c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).

SOPs for developing the InterMine platform are at http://intermine.readthedocs.io/en/latest/about/get-involved. These include SOPs for developing a feature and for making a new software release.

SOPs for integrating data in a particular InterMine installation are in the main body of the operating manual, where https://intermine.readthedocs.io/en/latest/database/database-building takes an operator through the entire process.

InterMine aligns well with the guidelines for sustainable software development. We make continued efforts to keep it as simple as possible to understand, build and install, with comprehensive documentation at http://intermine.readthedocs.io/en/latest. In some cases, new operators of InterMine download, build, deploy the system and publish journal articles about it without ever contacting us.

We believe that we also meet the sustainability and maintainability guidelines, including strong community engagement and code contributions, as detailed in part 3d of this form, a test suite (http://intermine.readthedocs.io/en/latest/get-started/intermine-tests), continuous integration (https://travis-ci.org/intermine/intermine) and ongoing work on interoperability, through support for common life sciences data resources, diverse output formats, RESTful endpoints and access libraries. With our BBSRC-funded work, we are maintaining and increasing InterMine’s adherence to the FAIR principles.

Resource commitment statement: we intend to stay in operation for as long as InterMine delivers value to users. We have been funded for almost 20 years from diverse sources and current grants run to 2020 and 2021, with the intention of continuing to seek further funding.

d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.

InterMine has several points of contact. The most general one is the info@intermine.org contact email shown on the intermine.org website. Beyond this, there is a developer mailing list (http://intermine.readthedocs.io/en/latest/support/mailing-list) where questions are answered promptly by the core team, a blog (https://intermineorg.wordpress.com/), an active Twitter presence (https://twitter.com/intermineorg) and live support via a Discord Server (http://chat.intermine.org). There is also a fortnightly teleconference that anyone can attend (e.g. https://lists.intermine.org/pipermail/dev/2018-July/004123.html). Request tickets, bug reports and code contributions are discussed and handled through Github at https://github.com/intermine.

For individual installations, the InterMine web application provides a “Questions? Comments? Click here!” a feedback link at the bottom of each web page (e.g. see http://www.mousemine.org/mousemine/begin.do) for end users to get in contact with the hosting organisation. The InterMine team makes use of a user-centred design process in which iterative discussion with end users informs development of the UI. We make a particular effort to obtain feedback during InterMine training sessions and workshops.

The InterMine core team and our community provide tutorials and videos, both for InterMine in general and, in the case of video tutorials, for specific InterMine installations. These materials are available through TESS at https://tess.elixir-europe.org/content_providers/intermine.

4. Legal framework, funding, and governance

a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.

The InterMine software is licensed under the LGPL, an Open Source licence compatible with Open Science.

For individual InterMine installations, the data they provide usually comes from many sources. Sources are generally poor at providing structured licence metadata, and in some cases do not provide any information at all. This constrains the extent to which InterMine can automatically present information to the user on how integrated data can be reused. In co-operation with the life sciences community we’re working on solutions to this problem, in initiatives such as Bioschemas that could provide standardised licence metadata.

b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.

InterMine installations collect minimal personal data for users who choose to create accounts, in order to provide continuity between sessions and their persistent workspace. Users do not need to create an account to use the InterMine system without these features. The InterMine privacy policy is at http://intermine.readthedocs.io/en/latest/about/#privacy-policy.

As InterMine is a deployable platform rather than a centrally hosted one, ultimately ethics and privacy issues are the responsibility of the organisations hosting the instances. These vary from country to country, both for non-ELIXIR and ELIXIR members.

c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.

InterMine has been funded since 2002 by the Wellcome Trust with the current grant extending until 2021. Other current funding is from the BBSRC (until 2020) and Innovate UK (until 2021). Past funders have included EPSRC, NIH and NSF. InterMine’s adaptability and large distributed community have allowed us to win continuous funding for almost twenty years, and thus we are optimistic about our future ability to raise funding.

d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.

Day to day software quality governance is ensured by the InterMine core developers, the software development community, its operators and its end users, through a process of unit and functional testing, continuous integration, code review, and actions on bug reports and other user feedback on production and beta releases. We believe this is in line with similar ELIXIR resources.

Two scientific advisory boards provide governance, one for each current grant (BBSRC and Wellcome Trust).

The BBSRC SAB members are Prof. John Colbourne (University of Birmingham), Dr Alan Robinson (MRC MBU, U. Cambridge), Dr David Goodstein (DoE Joint Genome Institute), Prof. Chris Elsik (University of Missouri), Prof. Michel Dumontier (Maastricht University), Jerven Bolleman (Swiss Institute of Bioinformatics), Dr Joel Richardson (The Jackson Laboratory), Rafael Jimenez (ELIXIR), Simon Jupp (EMBL-EBI), Dr Helen Parkinson (EMBL-EBI) and a BBSRC Representative.

The Wellcome Trust SAB members are Prof. Bertie Gottgens (CIMR, U. Cambridge), Prof. Winston Hide (Sheffield Institute of Translational Neuroscience), Dr Thomas Lemberger (EMBO Press), Dr Helen Parkinson (EMBL-EBI), Steve Taylor (MRC Weatherall Institute, Oxford), Cambridge Research Data Facility representative, H Noble (Research Technology Services, U. Oxford), Dr Lincoln Stein (Ontario Institute for Cancer Research), Dr Chris Wallace (MRC Biostatistics Unit, U. Cambridge), S Edmunds (GigaScience), H Murray (F1000) and a Wellcome Trust Observer.

The Wellcome Trust grant also provides for a User Advisory Group who will provide input to and feedback on the features added to InterMine.