Development of the PROCOGNATE database

The functional annotation of enzymes is an interesting but nontrivial task requiring experimental data and scientists' manual revision for optimal results. Due to the increasing amount of structural and sequence data, it is more difficult to do the case- by-case analysis, and there is a high demand for automated solutions. One of the first attempts to collect such data was the PROCOGNATE database (Bashton et al., 2008, DOI: 10.1093/nar/gkm611) followed by the development of the Transform- MinER tool (Tyzack et al., 2018, DOI: 10.1093/bioinformatics/bty394) which searches the reactants and products in KEGG database and matches them with ligand-protein complexes structures from PDB database. The current dataset has around 150,000 cases in nearly 13,000 unique PDBs.

The current dataset's usefulness for researchers is limited mainly through two factors: 1) the database contains only basic information about the mapping, 2) it is available only as a CSV file. The first limitation will be solved by enriching the original dataset with multiple structural features, such as pockets, tunnels, and interactions, directly related to the binding and unbinding of the ligands. The calculations are already ongoing and will be finished in the following months. The second limitation will be solved by developing the web user interface, which will present the data in a complete form using 3-D structure feature visualizations.

The main aim of this project is to kick-off the database development by: 1) acquisition of the pipeline used to construct the PROCOGNATE dataset, its merge with the pipeline for structure features assessment and preparation for regular automated updates; 2) design the database structure and import all the data; and 3) design of the user interface of the database. Once these stages are finished, the user interface development will begin and will continue till approx. Q2 2022.

Platform/Community