g:Profiler | ELIXIR

Notes given in the application form

Eligibility criteria

Must be an ELIXIR Service (i.e. be part of an existing ELIXIR Node’s Service Delivery Plan, or is ELIXIR commissioned work), or is in the official process/commitment of becoming one. (Required)
Must have evidence that it supports an interoperability activity, and has been deployed. (Required)
Must support or be forecast to support FAIR Principles. (Required)
Should fit into, or be forecast to fit into, the EIP roadmap for data interoperability or other activities relevant to ELIXIR mission.

Additional notes

Please complete this form by adding information for your Interoperability Resource in the appropriate section below. Consult with Recommended Interoperability Resource (RIR) selection criteria documentation on details for each section below.
Where a panel/question is not relevant to your Interoperability Resource, please leave it blank or mark as “not applicable”, optionally with a brief explanation as to why.
Word limit guidance is noted for free text fields.
Please include urls to external resources, where useful.
Any questions, contact Sirarat Sarntivijai (sirarat.sarntivijai@elixir-europe.org).

1. Resource facilitation to scientific research

a. Interoperability Resource: Briefly describe the function of the Interoperability Resource

g:Profiler is a web server that includes tools for characterising and manipulating gene lists. In addition to the gene set enrichment analysis service g:GOSt, we also provide two widely used interoperability services relying on ENSEMBL data. Firstly, g:Convert provides a convenient and seamless service to translate identifiers of genes, proteins, probe sets and many other namespaces. We provide at least 40 types of IDs for more than 50 species, with a minimum of 13 for all the 200+ supported species. Secondly, g:Orth maps homologous genes between species and provides cross-references between all organisms in g:Profiler.

b. Scope statement: describe the scope , and the users of the resource. How is the Interoperability Resource positioned with respect to other similar Interoperability Resources ? Include the base URL and, if relevant, the introductory or “about” page URL.

The users of the interoperability services highlighted in this application (g:Convert and g:Orth) are life scientists working mostly with high-throughput genomics data. The major problem these services help to solve is the need of transforming platform specific identifiers (e.g. Illumina IDs) to some alternative namespace needed by specific analysis (web)tools (e.g UNIPROT IDs). This is usually a very annoying and time consuming task majority of life science researches have experienced first hand. This problem can be either data set specific and can be solved using the web application each time it occurs or if it is a frequent problem then it needs a programmatically established solution. With g:Profiler toolkit we provide both solutions for a variety of data analysis platforms (web application, web service, Python and R packages, Galaxy component etc). g:Convert is capable of translating identifiers for all the 213 species represented at g:Profiler. For each individual species it supports up to 13-103 identifier namespaces. These namespaces cover the majority of identifiers used by various databases, technical platforms and analysis tools. Thus g:Convert is capable of supporting workflow construction where identifier mappings are often needed to combine different tools into a single analysis pipeline. g:Orth facilitates translating knowledge from one organism to another. For example, knowledge from well studied model organisms could be transferred to less studied systems via mapping the knowledge of orthologous genes using g:Orth service to find and match the appropriate orthologous gene pairs.

c. Resource url

https://biit.cs.ut.ee/gprofiler/page.cgi?welcome

d. Inter-organisational recognition: does the Interoperability Resource have community recognition? (e.g. demonstrated through a collaboration, geographical diversity in the source of the submissions, international diversity of delivery partners and/or funders)

With more than 30 000 unique users annually from 150 countries, g:Profiler is well established service set for the life science community. While the g:Profiler’s most widely known functionality, of gene set enrichment analysis, is one of the external tools suggested by Gene Ontology (http://geneontology.org/page/go-enrichment-analysis) the other services - g:Convert, g:Orth, g:SNPense are also used by thousands of users each year.

Majority of the developing, maintaining and promoting g:Profiler is done in ELIXIR Estonia and supported by its European partners (through EMBL-EBI trainings for example). However, an active promotion of g:Profiler has been also done outside Europe in Canada, as one of the initial key developers of the tool, Dr. Jüri Reimand, relocated there after graduating from the University of Tartu. For example, g:Profiler is taught regularly at the Canadian Bioinformatics days and is included as part of the EnrichmentMap analysis workflow (http://enrichmentmap.readthedocs.io/en/latest/).

2. Community

a. Community impact: If applicable, provide documented evidence of community impact (e.g., publication citations, API calls, projects using the resource, etc.)

According to the Google Analytics profile, g:Profiler is actively used across the world, having had visitors from 130 countries during the last 12 months (from >32 000 unique users yearly, >180 user requests daily). The top 10 countries with most users of the web application during the last 12 months have been USA, China, UK, Canada, Germany, India, France, Estonia, Spain and Italy. Since 2007 g:Profiler has been cited in total by 1094 publications (according to Google Scholar, July 2018).

b. Potential usage: Describe other systems that could use this candidate resource, but currently do not.

In principle all web services/R packages/Galaxy workflows that need to deal with gene/protein/probe set identifiers could benefit of using g:Convert as a service to translate the user provided identifiers to the suitable namespace for the service. This would take off the burden of the service developers to maintain their own matching tables of identifiers up to date and provide the users the flexibility to provide their identifiers in the most favourable form without the need of manual translation of the identifiers.

c. Outreach & support: Provide resource support publication(s)/user documentation(s) describing the Interoperability Resource (e.g. scientific journal publications, community preprints, resource user’s documentations etc.), resource dissemination plan (e.g. workshops, conference presentations), and other equal-opportunity research support (if applicable).

The g:Profiler tool set has been described in three Nucleic Acids Research Web Server Issue releases in 2007, 2011 and 2016 with yet another update article to be submitted early in 2019 (https://doi.org/10.1093/nar/gkm226,https://doi.org/10.1093/nar/gkr378,h…). g:Profiler toolset is used by hundreds of researchers daily. Since 2007 g:Profiler three NAR web server issue papers (2007,2011,2016) have been cited in total by 1094 publications (according to Google Scholar, July 2018). Since the launch of the toolset, g:Profiler has been actively taught at workshops locally in Estonia and internationally (City of Hope, USA; Oxford, UK; Cambridge, UK; EMBL-EBI, UK; Institute Pasteur, France; Toronto, Canada).

d. Dependency of other resources: How is this resource critical to the user(s)? Do other resources depend on the resource described here to provide downstream service? Please list, or provide a link to a diagram.

The core interoperability functionality of g:Convert is used by various ELIXIR Estonia services included in the Service Delivery Plan to provide a seamless and flexible user experience by supporting whatever identifier the user might have. This includes the network analysis tool GraphWeb, global expression similarity tool MEM and pathway visualisation service KEGGAnim.

Beyond the existing ELIXIR Estonia SDP services, emerging services for extracting the most significant biological features of a dataset - funcExplorer, and a PAWER tool for analysing protein microarray data rely heavily on g:Convert service to provide the same seamless user experience.

3. Quality of resource

a. Uptime: Average percentage uptime/month during the last 12 months, response time of the resource. In case of ontology/standards production, interval of update/release, adaptability of ontology design patterns to evolving data. Provide information where applicable: uptime of resource, software release cycle (please state week/month etc), update frequency.

g:Profiler relies heavily on data provided by Ensembl database and follows its approximate three-months release cycle. Since 2015, all Ensembl releases from #78 have been followed by a g:Profiler data release and currently the latest supported release is Ensembl #91 and release #92 is in the data update process. In order to increase the reproducibility of scientific findings, g:Profiler also provides all the previous software archives since 2011 (https://biit.cs.ut.ee/gprofiler/archives/). The archive functionality allows researchers to compare the results that were obtained using the previous releases and refer studies to their original data context. It is important to notice that while all the namespace information, gene descriptions and majority of Gene Ontology annotations are automatically fetched from Ensembl through Biomart service, additional enrichment analysis files are obtained from alternative data sources (KEGG, Reactome, Transfac etc) and updated at the same time as Ensembl data.

g:Profiler web server has not experienced any significant downtime during the last years and uptime of resource is expected to be more than 99.9% of the time. The downtime of less than 0.1% (less than 10 hours per year) includes also the planned downtime during the data copying needed for pushing every release live (takes minutes to few tens of minutes per release).

b. Accessibility: what are resource retrieval mechanisms? Does the resource provide web-based user interface, application programmable interface (API), containers, and/or other channels? Please list resource access mechanism, provide URLs as applicable.

g:Profiler provides several means of data retrieval suitable for both the regular web users and more programmatically advanced user community. Majority of the current users use g:Profiler services either through the web application (https://biit.cs.ut.ee/gprofiler) or through the R package (http://cran.r-project.org/web/packages/gProfileR/). Additional access points are an official Python package (https://pypi.python.org/pypi/gprofiler-official), BioJS component (https://www.npmjs.org/package/biojs-vis-gprofiler), web service (documentation at https://biit.cs.ut.ee/gprofiler/help.cgi?help_id=55) and Galaxy (https://toolshed.g2.bx.psu.edu/repository?repository_id=2d3d786121020d7a) and Chipster components (https://biit.cs.ut.ee/gprofiler/misc/gprofiler-chipster.tar.gz). All these options are provided for the users at the https://biit.cs.ut.ee/gprofiler/page.cgi?apis page.

c. Maintenance quality: Is there a maintenance SOP or plan, reflecting sustainability and scalability? Does it align with guidelines for sustainable software development? Please include a resource commitment statement (description text or URL).

All the code of the g:Profiler has been developed using the version control since the beginning of development in 2005. g:Profiler data update process, including data download from ENSEMBL needed for the interoperability services g:Convert and g:Orth, follows an internal protocol to ensure that all the files relevant for the service release are downloaded, linked and indexed properly. Everyday maintenance and development of g:Profiler services is driven by a team of professional software developers and researchers who are committed to guarantee the continuation of the g:Profiler service at its established level and sustain the core functionality first published in 2007.

d. Support quality: Please list support mechanisms (e.g., point of contact, request ticketing, resource’s response time where a solution is identified, etc.), and methods to collect user feedback. If available, list tutorial documentations or tutorial materials and format, including linking on the ELIXIR’s Training Portal (TeSS) (or other training platforms) where applicable.

g:Profiler main point of contact is an online contact form provided at the website or biit.support@ut.ee email address. The full development and management team of the BIIT services receives the emails and provides user support. No ticketing system is currently applied as the average number of user support requests is below ten per month and majority of the requests get an answer in few hours by one of the team members. The most frequently asked questions are also answered on the website (https://biit.cs.ut.ee/gprofiler_beta/page.cgi?faq) 

Specific questionnaire was used to collect user feedback and ideas for further development during 2018 and the collected information is used to provide additional functionality in the next software release coming out later in 2018. Additional feedback is collected during training events by the team.

4. Legal framework, funding, and governance

a. Legal framework: What are the resource’s license/terms of use? Can the license facilitate Open Science? Please include the url for the license the resource uses.

All usage, either at the web or programmatically is free to everyone, including academic and private sector. Majority of the data, with exceptions of KEGG and Transfac data due to licensing restrictions, are available for download by the users. There is currently no licence statement beyond the statement of “g:Profiler tool is freely available through web application and various programmatic access points.” provided at the Welcome page https://biit.cs.ut.ee/gprofiler/page.cgi?welcome.

b. Privacy/Ethics policy: If applicable, is there a publicly available privacy policy in which use and security around personal data are described (e.g. the EU General Data Protection Regulation (GDPR), ELIXIR Ethics Policy, other relevant ELIXIR Policies)? Please include the url of the privacy/ethics policy, if applicable.

g:Profiler does not have user accounts and by default does not collect any user specific information with the exception Google analytics cookies and query parameters (no full queries are kept). The data is collected to improve the usability and usefulness of the service. Information about the above is provided as part of the “Terms of Use” on the FAQ page.

c. Funding & sustainability plan: List of funding sources supporting the resource, and sustainability plan.

The development of g:Profiler tool set has been continuously supported by various research and infrastructure grants since its development started in 2005. Currently, g:Profiler is part of ELIXIR Estonia core services and its development and maintenance is funded by a research infrastructure grant “Estonian Life Science Infrastructure for Biological Information” until August 2022.

d. Governance: Describe the Resource’s QA/QC plan that guarantees similar quality governance to that of ELIXIR. Please link SAB members, if applicable.

g:Profiler is maintained by a team of professional software developers, bioinformatics researchers and consulting statisticians. The team is led by bioinformatics prof. Jaak Vilo. All the decisions regarding the g:Profiler tool set are made by members of the team. If needed, external experts are consulted, including members of the ELIXIR Estonia SAB. g:Profiler does not have its own SAB.