In many scientific fields of research, recent advances in technology have allowed to gather data sets characterized by a very high number of variables. The sample size of these data can be small compared to the number of variables, and frequently only a small number of these variables might be relevant for the study. In several research contexts binary variables are considered to express the presence or absence of particular elements or features. In this seminar we would like to provide a contribution in developing a new procedure to estimate high-dimensional regression models with this particular structure. The procedure is derived by combining the class of penalized regression models with binary variables clustering techniques. We will describe this novel procedure with a study concerning a particular research aspect of drug discovery design.
Valentina Mameli is currently a post-doctoral research fellow at European Centre for Living Technology. She is working on a research project aiming to develop statistical models for reducing the dimensionality of chemical datasets (BLOOM project). Previously, she worked as a post-doctoral research fellow in Statistics at the Department of Environmental Sciences, Informatics and Statistics of the Ca'Foscari University of Venice, at the Department of Department of Statistical Sciences of the University of Padova and at the Department of Mathematics and Computer Science of the University of Cagliari. She held a PhD degree in Mathematics and Scientific Computing at the University of Cagliari in 2012. In 2010 she was a visiting PhD student at the Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge (UK).