​Efficient Algorithms to Find Statistically Significant Patterns​

Fabio Vandin

Università di Padova​

4 May 2018, 2:00 pm

Abstract

The extraction of patterns displaying significant association with a target​ ​feature is a key data mining task with wide application in many domains,​ ​including social networks analysis and cancer genomics. The identification of​ ​such patterns from modern datasets poses several computational and statistical​ ​challenges, due to the massive size of modern datasets and to the exponential​ ​number of patterns to be considered. In this talk I will present our recent work​ ​that addresses some of these challenges. First, I will present an algorithm to​ ​mi​​ne the top-k significant patterns while rigorously controlling the family-wise​ ​error rate of the output. Our algorithm enables the extraction of the most​ ​significant patterns from large datasets that could not be analyzed by the​ ​state-of-the-art. Second, I will present our work on extracting high-quality​ ​approximations of the most significant patterns using a random sample of a​ ​transactional dataset, according to various interestingness measures, while​ ​providing rigorous guarantees on the quality of the approximation. Our algorithm​ ​vastly s​ ​peeds up the discovery of subgroups with respect to analyzing the whole​ ​dataset.

Bio-sketch

Fabio Vandin is an Associate Professor in the Department of Information​ ​Engineering at the University of Padova, Italy. His research interests are in​ ​efficient and rigorous algorithms for the extraction of useful information from​ ​large amounts of data. He has applied his methods mostly to problems in​ ​computational biology and biomedicine, but his work has found application in a​ ​variety of areas, including social network analysis and wireless networks. He​ ​received his PhD in Information Engineering from the University of Padova​ ​(Italy), and he has held research positions with various titles at the​ ​Department of Computer Science at Brown University (USA) and at the Department​ ​of Mathematics and Computer Science at the University of Southern Denmark. In​ ​2016 he has been a Research Fellow at the Simons Institute for the Theory of​ ​Computing at the University of California, Berkeley (USA).

Last update: 07/09/2018