21 Mar 2023 14:00

Bayesian nonparametric prediction of the taxonomic affiliation of DNA sequences

Sala Riunioni B (edificio Zeta)

Speaker: Tommaso Rigon, Milano Bicocca

     Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic identification. However, environmental sequences can be from organisms that are either unknown to science or for which there are no reference sequences available. Thus, the taxonomic novelty of a sequence needs to be accounted for when doing classification. We propose Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow unobserved taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly flexible supervised algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. We run our algorithm on a carefully annotated library of Finnish ar
 thropods (FinBOL). To assess the ability of BayesANT to recognize novelty and to predict known taxonomic affiliations correctly, we test it on two training-test splitting scenarios, each with a different proportion of taxa unobserved in training. Our algorithm attains accurate predictions and reliably quantifies classification uncertainty, especially when many sequences in the test set are affiliated with taxa unknown in training                                           

Bio Sketch
Tommaso Rigon is Assistant Professor of Statistical Science at the department of Economics, Management and Statistics (DEMS) of University of Milano-Bicocca. He's a member of the Datalab at the University of Milano-Bicocca, the BayesLab at the Bocconi Institute for Data Science and Analytics (BIDSA), and the MIDAS Complex Data Modeling Research Network. In 2021 he was awarded the Savage Award (Theory and Methods) by American Statistical Association & International Society for Bayesian Analysis. His research interests focus on Bayesian methods for analyzing complex data, covering the theoretical, applicative, and computational aspects.


L'evento si terrà in inglese


Dipartimento di Scienze Ambientali, Informatica e Statistica - Gruppo Statistica

Cerca in agenda