Representatives from ELIXIR UK, EMBL-EBI, the ELIXIR Hub, bioCADDIE, BioSchemas and the W3C HCLS Community Profile met with Google last week in London to discuss how to describe datasets using schema.org.
Finding datasets, and understanding their content, is a challenging task for humans and currently not possible to automate. Schema.org is an initiative from the major web search engines to help with the discovery of web resources. There are multiple parallel activities in the life-sciences community to develop system to publish metadata about datasets.
This is due to the wide variety of use cases that dataset descriptions need to satisfy, including data discovery, data citation, and provenance tracking. bioCADDIE carried out an extensive analysis of the use cases and what is needed to satisfy them.
The meeting with Google focused on the find-ability of datasets with an emphasis on data citation. The next steps will be to develop pilot projects to both publish and use dataset descriptions for discovery and citation.