Towards Analysis of SMiLE-seq raw data with the ultimate goal of identification of binding sites of the poorly characterized transcription factors

SMiLE-seq is a new effective experimental method for transcription factor (TF) binding site sequence inference. Still, some TFs are challenging to analyze. We hope to improve the method by using modern statistical and deep learning approaches in both experiment design and the subsequent data analysis.

Deliverables:

  • a tool for inferring binding motifs that cover the sequence space representatively
  • GUI for analysis and analysis improvement
  • “denoisifier” – a tool to use prior to the HMM-based analysis

Milestones:

  • familiarizing with the SMiLE-seq method and with the current workflow in detail (mainly with analysis – HMM-based workflow)
  • prototype a tool for motif inference
  • test the tool for designing sequences
  • identify noise sources (more like number of noise sources) – statistical methods, Autoencoder architecture?
  • prototype “real” TF binding site extraction