This collaborative project focuses on two issues related to bioinformatics.
The first issue concerns the availability of the virAnnot pipeline (doi:10.1094/PBIOMES-07-19-0037-A) that was developed in the Virology team (INRAE UMR 1332) for the CATI BARIC community and the collaborators of the Virology team. This pipeline, intended for everyone, is however only used by bioinformaticians as it can only be used on command line.
The first objective of this project is thus the integration of this pipeline in the GALAXY environment which is widely used by the scientific community for the analysis of HTS data in viral metagenomics. The Bioinformatics group of Wageningen University is a major pole of the European programme for the provision of bioinformatics resources in life sciences and is involved in training under Galaxy. It will be possible to follow the training given on site and interact with the trainers.
At the end of the mission, the pipeline will be implemented in Galaxy and will be included in the analysis offer of the Galaxy platform hosted on the Genouest cluster in Rennes, France, which is part of the French Institute of Bioinformatics (IFB), the French Node of ELIXIR, making viral metagenomics analyses easily accessible to the CATI BARIC community and to all users of the GALAXY environment.
The second issue concerns the accurate identification of QTL regions. Indeed, a QTL region covers a rather large gene region with many genes, so it is difficult to find the real causal gene. This is often done using experimental fine mapping, but this method is laborious.
The second objective of this project is therefore to develop a method to prioritize the genes of a QTL region in order to identify more precisely the causal gene affecting the trait. An in silico method will be developed to improve the resolution of a QTL analysis. This method will be based on prior knowledge from literature/databases via semantic web technologies and will take the form of a data query tool.
The developments of Dr. Harm Nijveen's workgroup are based on Semantic Web and Linked Open Data (LOD) technologies. The tool developments will be based on two existing platforms, WormQTL2 (doi:10.1093/database/baz149) and AraQTL (doi:10.1111/tpj.13457), populated with data on C. elegans and A. thaliana. These databases can then be used and enriched with data on fruit trees, which are present in the Virology team.
This work will enhance interactions between the Virology team and the Bioinformatics group as well as strengthening skills around the semantic web and data interoperability for both partners.