Metodologia e applicazioni della statistica (STREAM) 
STatistics REsearch: Applications and Methods

Il gruppo di ricerca si occupa dello sviluppo di metodi statistici e della loro applicazione in una varietà di discipline.
La ricerca metodologica si concentra sull’inferenza bayesiana e frequentista, in ambito parametrico e non-parametrico. Specifici contributi metodologici hanno riguardato i dati categoriali, le carte di controllo, la meta analisi, i valori estremi, le serie temporali, i dati spaziali, l'analisi di sensibilità.
Il gruppo si occupa anche di statistica computazionale e dello sviluppo di software statistico in particolare con il linguaggio R.
Il gruppo è particolarmente attivo nell'applicazione dei metodi statistici in collaborazioni multidisciplinare con ricercatori e professori dell’Università Ca’ Foscari e di altri istituti di ricerca. Le Applicazioni recenti hanno riguardato, per esempio, la climatologia, l'economia, le scienze ambientali, l'epidemiologia, l'idrologia, la medicina, la sociologia e lo sport.


Seminari

Tutti i seminari si possono seguire tramite la piattaforma Zoom: https://unive.zoom.us/j/82776377762
Meeting ID: 827 7637 7762
Passcode: SanMarco1

18/05/2021 ore 14:00
Laura Sangalli (Politecnico di Milano)
Titolo: Functional and complex data - new methods merging statistics, scientific computing and engineering

Abstract

Recent years have seen an explosive growth in the recording of increasingly complex and high-dimensional data. Classical statistical methods are often unfit to handle such data, whose analysis calls for the definition of new methods merging ideas and approaches from statistics, applied mathematics and engineering. This seminar in particular focuses on functional and spatial data defined over complex multidimensional domains, including curved bi-dimensional domains and non-convex three-dimensional domains. I will present an innovative class of methods, based on regularizing terms involving partial differential equations. The proposed methods make use of advanced numerical techniques, such as finite element analysis and isogeometric analysis. An illustration to the analysis of neuroimaging data is provided. In this applicative domain, the proposed methods offer important advantages with respect to the best state of art techniques, allowing to correctly take into account to complex anatomy of the brain.


25/05/2021 ore 14:00
John Aston
(University of Cambridge)
Titolo: Functional Data on Constrained Spaces


1/06/2021 ore 14:00
Michael Fop (University College Dublin)
Titolo: A composite likelihood approach for model-based clustering of high-dimensional data


8/06/2021 ore 14:00
Riccardo Rastelli
(University College Dublin)
Titolo: A time-continuous extension of the latent position network model for instantaneous interactions


Seminari passati

23/03/2021 ore 14:00
Giada Adelfio
(Università degli Studi di Palermo)
Titolo: Some properties of local weighted second-order statistics for spatio-temporal point processes

Abstract

Spatial, temporal, and spatio-temporal point processes, and in particular Poisson processes, are stochastic processes that are largely used to describe and model the distribution of a wealth of real phenomena.
When a model is fitted to a set of random points, observed in a given multidimensional space, diagnostic measures are necessary to assess the goodness-of-fit and to evaluate the ability of that model to describe the random point pattern behaviour. The main problem when dealing with residual analysis for point processes is to find a correct definition of residuals. Diagnostics of goodness-of-fit in the theory of point processes are often considered through the transformation of data into residuals as a result of a thinning or a rescaling procedure. We alternatively consider here second-order statistics coming from weighted measures. Motivated by Adelfio and Schoenberg (2010) for the spatial case, we consider here an extension to the spatio-temporal context in addition to focussing on local characteristics.
Then, rather than using global characteristics, we introduce local tools, considering individual contributions of a global estimator as a measure of clustering. Generally, the individual contributions to a global statistic can be used to identify outlying components measuring the influence of each contribution to the global statistic.
In particular, our proposed method assesses goodness-of-fit of spatio-temporal models by using local weighted second-order statistics, computed after weighting the contribution of each observed point by the inverse of the conditional intensity function that identifies the process.
Weighted second-order statistics directly apply to data without assuming homogeneity nor transforming the data into residuals, eliminating thus the sampling variability due to the use of a transforming procedure. We provide some characterisations and show a number of simulation studies.


7/04/2021 ore 15:00
Stefano Castruccio
(University of Notre Dame)
Titolo: Model- and Data-Driven Approximation of Space-Time Systems. A Tale of Two Approaches

Abstract

In this talk I will discuss two different approaches to approximate space-time systems. This first one is model-driven and loosely inspired by physics, assumes that the system is locally diffusive through a stochastic partial differential equation, and can be efficiently approximated with a Gaussian Markov random field. This approximation will be used to produce a stochastic generator of simulated multi-decadal global temperature, thereby offering a fast alternative to the generation of large climate model ensembles.
The second approach is instead data-driven, and relies on (deep) neural networks in time. Instead of traditional machine learning methods aimed at inferring an extremely large parameter space, we instead rely on an alternative fast, sparse and computationally efficient echo state network dynamics on an appropriately dimensionally reduced spatial field. The additional computational time is then used to produce an ensemble and probabilistically calibrate the forecast. The approach will be used to produce air pollution forecasts from a citizen science network in San Francisco and forecasting wind energy in Saudi Arabia.


13/04/2021 ore 14:00
Alan Agresti
(University of Florida)
Titolo: Simple Ways to Interpret Effects in Modeling Binary and Ordinal Data

Abstract

Probability-based effect measures for models for binary and ordinal response variables can be simpler to interpret than logistic (and probit) regression model parameters and their corresponding effect measures, such as odds ratios.
For describing the effect of an explanatory variable while adjusting for others in modeling a binary response, it is sometimes possible to employ the identity and log link functions to generate simple effect measures.
When such link functions are inappropriate, one can still construct analogous effect measures from a logistic regression model fit, based on average differences or ratios of the probability modeled or on average instantaneous rates of change for the probability.
Simple measures are also proposed for interpreting effects in models for ordinal responses based on applying a link function to cumulative probabilities.
The measures are also sometimes applicable with nonlinear predictors, such as in generalized additive models.
The methods are illustrated with examples and implemented with R software.

Parts of this work are joint with Maria Kateri, Claudia Tarantola, and Roberta Varriale.


20/04/2021 ore 14:00
David Firth (University of Warwick)
Titolo: Schedule-adjusted league tables during the football season

Abstract

In this talk I will show how to construct a better football league table than the official ranking based on accumulated points to date.  The aim of this work is (only) to produce a more informative representation of how teams currently stand, based on their match results to date in the current season; it is emphatically not about prediction.  A more informative league table is one that takes proper account of "schedule strength" differences, i.e., differing numbers of matches played by each team (home and away), and differing average standings of the opponents that each team has faced.

This work extends previous "retrodictive" use of Bradley-Terry models and their generalizations, specifically to handle 3 points for a win, and also to incorporate home/away effects coherently without assuming homogeneity across teams.  Playing records that are 100% or 0%, which can be problematic in standard Bradley-Terry approaches, are incorporated in a simple way without the need for a regularizing penalty on the likelihood. A maximum-entropy argument shows how the method developed here is the mathematically "best" way to account for schedule strength in a football league table.

Illustrations will be from the English Premier League, and the Italian Serie A.


04/05/2021 ore 14:00
Manuele Leonelli (IE University Madrid)
Titolo: Untangling complex dependencies in categorical data using staged trees

Abstract

The dependence structure of a categorical random vector is often studied by means of a probabilistic graphical model. The most commonly used model is the so-called Bayesian network which provides an intuitive and efficient framework to assess (causal) dependencies. One of the major drawbacks of these models is that they can only explicitly represent symmetric dependencies, which, in practice, may not give a complete description of the data dependence structure. Staged trees are a flexible class of graphical models which can explicitly represent and model a wide array of non-symmetric dependence
In this talk, I will provide an overview of this model class and their application to a wide array of datasets. I will also discuss a number of ongoing developments for staged trees, including efficient structural learning, causal discovery, manipulations of the graphs and the new stagedtrees R package.
The talk is based on joint work with Gherardo Varando (University of Valencia), Federico Carli and Eva Riccomagno (University of Genova).


Last update: 11/05/2021