cafoscariNEWS

Agenda

27 Apr 2026 12:15

seminario/lezione

Data Thinning and Beyond

Aula EPSILON 1 - Edificio EPSILON | Campus Scientifico

Speaker:
Daniela M. Witten, University of Washington

Abstract:
Contemporary data analysis pipelines often involve the use and reuse of data. For instance, a scientist may explore a dataset to select an interesting hypothesis, and then wish to test this hypothesis with the same data. From a statistical perspective, this double use of data is highly problematic: it induces dependence between the hypothesis generation and testing stages, which complicates inference. Failure to account for this dependence renders classical inference techniques invalid. I will present "data thinning", a set of strategies for obtaining independent training and test sets so that the former can be used to select a hypothesis, and the latter to test it. Data thinning enables valid selective inference in settings for which no solutions were previously available. However, it is also restrictive, in the sense that it requires strong distributional assumptions. Therefore, I will also present two strategies inspired by data thinning that enable valid post-selection inference without such assumptions. One strategy considers thinning summary statistics of the data, rather than the data itself, in order to take advantage of asymptotic properties of the summary statistics. The second strategy involves generating training and test sets that are not independent, and then orthogonalizing the latter with respect to the former in order to conduct valid inference.

Bio sketch:
Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning. Daniela is the recipient of an NIH Director's Early Independence Award, a Sloan Research Fellowship, an NSF CAREER Award, and a Simons Investigator Award in Mathematical Modeling of Living Systems. She received the Presidents’ Award from the Committee of Presidents of Statistical Societies (COPSS), awarded annually to a statistician under age 41 in recognition of outstanding contributions to the field of statistics. She also received the Spiegelman Award from the American Public Health Association for a statistician under age 40 who has made outstanding contributions to statistics for public health, and the Leo Breiman Award for contributions to the field of statistical machine learning. She is a Fellow of the American Statistical Association and the Institute for Mathematical Statistics, and an Elected Member of the International Statistical Institute. Daniela is a co-author (with Gareth James, Trevor Hastie, and Rob Tibshirani) of the very popular textbook "Introduction to Statistical Learning". She has served as an Associate Editor for Biometrika, Journal of Computational and Graphical Statistics, and Journal of the American Statistical Association, and as an Action Editor for Journal of Machine Learning Research. Since 2023, she serves as Joint Editor of Journal of the Royal Statistical Society, Series B. Daniela completed a BS in Math and Biology with Honors and Distinction at Stanford University in 2005, and a PhD in Statistics at Stanford University in 2010.

Lingua

L'evento si terrà in inglese

Organizzatore

Gruppo Statistica

Allegati

Flyer

1902 KB

Tipologia	Nome	Fornitore (Dominio)	Descrizione	Durata	Informativa
Necessario	_pk_id[*]	unive/WAI	*	30 giorni	Informativa
Necessario	_pk_ses[*]	unive/WAI	*	1 giorno	Informativa
Necessario	_pk_ref[*]	unive/WAI	*	6 mesi	Informativa
Necessario	_gsas	unive/google	Memorizza le preferenze dell'utente	3 mesi	Informativa
Necessario	_opensaml_req_cookie%	unive	Gestione autenticazione e SingleSignOn (shibboleth)	sessione	Informativa
Necessario	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	Mantiene i dati di sessione del SingleSignOn	Sessione	Informativa
Necessario	PHPSESSID	Unive.it (www.unive.it)	Identificatore univoco dell'utente per gli applicativi del sito	Sessione	Informativa
Necessario	cookie[*]	Unive.it (www.unive.it)	Memorizza le preferenze dell'utente sui cookie	1 mese	Informativa
Necessario	cookie	idp.unive.it	Memorizza le preferenze dell'utente sui cookie	1 mese	Informativa
Necessario	fe_typo_user	Unive.it (www.unive.it)	Identificatore univoco dell'utente per l'area riservata del sito	sessione	Informativa
Necessario	JSESSIONID	Unive.it (www.unive.it)	Utilizzato per creare le sessioni in area riservata	sessione	Informativa
Necessario	ADMCMD_prev	Unive.it (www.unive.it)	Utilizzato per la gestione degli accessi al cms typo3	sessione	Informativa
Necessario	unive.it	Unive.it (www.unive.it)	servono a registrare le preferenze sui cookies	6 mesi	Informativa
Necessario	noiframe	Unive.it (www.unive.it)	servono a registrare le preferenze sui cookies	6 mesi	Informativa
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Utilizzato per finalità di targeting per costruire un profilo degli interessi dei visitatori del sito web al fine di mostrare pubblicità Google pertinente e personalizzata.	1 mese	Informativa
Google - Youtube	CONSENT	Google (google.com)	Utilizzato da google per memorizzare le preferenze dell'utente	17 anni	Informativa
Facebook - Pixel	Socialpix	Unive.it (www.unive.it)	Servono a registrare le preferenze sui cookiesc	6 mesi	Informativa Università Ca' Foscari
Facebook - Pixel	_fbp	Unive.it (www.unive.it)	Traccia gli utenti per il retargeting pubblicitario su Facebook	3 mesi	Informativa facebook

Agenda

Data Thinning and Beyond

Aula EPSILON 1 - Edificio EPSILON | Campus Scientifico

Lingua

Organizzatore

Allegati

Cerca in agenda

cafoscariNEWS

Eventi

Lista cookies rilasciati

Agenda

Data Thinning and Beyond

Aula EPSILON 1 - Edificio EPSILON | Campus Scientifico

Lingua

Organizzatore

condividi su:

Allegati

Cerca in agenda