Understanding the compositions, interactions, and evolutions of microbial communities holds the key to balancing entire ecosystems and the planet's health (plants, soil, and animals). Genomics play a crucial role towards microbiome understanding. The challenge is that microbiomes (for example the human gut) contain a large variety of small taxa (e.g., bacteria), so when we collect a gene from a sample, we don't now exactly which taxa it comes from.
Traditional methodology heavily relies on correctly classifying these gene segments, which in turn requires reference tables and human intervention. However, reference sequences remain unknown for a vast majority of microbial biodiversity, which forces scientists to discard unidentified sequences that cannot be properly categorizedin, precluding holistic metagenomic analyses.
Our research investigates models and methods (such as mixture matrix completion) to simultaneously classify, assemble, and align sequences, without the need of reference genomes.