High Dimensional Data, Networks, and Beyond
Workshop 5/12/2025, Treviso
Workshop
The High-Dimensional Data, Networks, and Beyond workshop aims to bringing together researchers working at the intersection of Bayesian statistics, network modeling, and high-dimensional econometrics.
Particular emphasis will be given to applications in economics, finance, and the social sciences, with connections to key themes of the PNRR GRINS project.
Programme
Keynote speakers
Full programme
|
|
Workshop programme 5/12/2025 | 371 KB |
The programme features 2 keynote talks and 8 presentations in two contributed sessions dedicated to network and high dimensional data, offering a balanced mix of theoretical, methodological, and applied perspectives.
- 9.00 Welcome remarks
- 9.10 Keynote lecture: Daniele Durante, Bocconi University - “Bayesian Modelling of Criminal Networks”
- 10.00 Coffee break
- 10.30 Session 1: Network Data
- Ovielt Baltodano Lopez, Ca’ Foscari University of Venice - “A Dynamic Stochastic Block Model for Multidimensional Networks”
- Matteo Iacopini, Luiss University - “A Dynamic Latent Space Model for Matrix-valued Count Time Series”
- Antonio Peruzzi, Ca’ Foscari University of Venice - “Media Bias and Polarization through the Lens of a Markov Switching LS Network Model”
- Lorenzo Schiavon, Ca’ Foscari University of Venice - “Informed Order-Invariant Identifiability for Bayesian Factor Models”
- 12.30 Lunch break
- 14.30 Session 2: High Dimensional Data
- Angela Andreella, Ca’ Foscari University of Venice - “Inference on Multiple Quantiles in Regression Models by a Rank-score Approach”
- Mattia Stival, Ca’ Foscari University of Venice - “The Perceived Influences of Environment on Health in Italy: a Penalized Ordinal Regression Approach”
- Emanuele Aliverti, University of Padua - “Scalable Approximate Inference for Cumulative Regression Models”
- Qing Wang, Ca’ Foscari University of Venice - “Compressed Bayesian Tensor Regression”
- 16.30 Coffee break
- 17.00 Keynote lecture: Antonio Canale, University of Padua - “Adaptive Partition Factor Analysis”
- 17.50 Concluding remarks
Abstracts
Keynote lectures
Joint work with Elena Bortolato
Factor Analysis has traditionally been utilized across diverse disciplines to extrapolate latent traits that influence the behavior of multivariate observed variables. Historically, the focus has been on analyzing data from a single study, neglecting the potential study-specific variations present in data from multiple studies. Multi-study factor analysis has emerged as a recent methodological advancement that addresses this gap by distinguishing between latent traits shared across studies and study-specific components arising from artifactual or population-specific sources of variation. In this talk, we extend the current Bayesian methodologies by introducing novel shrinkage priors for the latent factors, thereby accommodating a broader spectrum of scenarios - from the absence of study-specific latent factors to models in which factors pertain only to small subgroups nested within or shared between the studies. For the proposed construction we provide conditions for identifiability of factor loadings and guidelines to perform straightforward posterior computation via Gibbs sampling. Through comprehensive simulation studies, we demonstrate that our proposed method exhibits competing performance across a variety of scenarios compared to existing methods, yet providing richer insights. The practical benefits of our approach are further illustrated through applications to bird species cooccurrence data and ovarian cancer gene expression data.
Europol recently defined criminal networks as a modern version of the Hydra mythological creature, with covert and complex structure. Indeed, connectivity data among criminals are subject to measurement errors, structured missingness patterns, and exhibit a sophisticated combination of an unknown number of core-periphery, assortative and disassortative structures that may encode key architectures of the criminal organization. The coexistence of these noisy, and possibly multiscale, block patterns limits the reliability of community detection algorithms routinely used in criminology, thereby leading to overly-simplified and possibly biased reconstructions of organized crime architectures. In this talk, I will present a number of Bayesian model-based solutions which aim at covering these gaps via a combination of stochastic block models and priors for random partitions arising from Bayesian nonparametrics. These include Gibbs-type priors, and random partition priors driven by the urn scheme of a hierarchical normalized completely random measure. Product-partition models to incorporate criminals’ attributes, and zero-inflated Poisson representations accounting for weighted edges and security strategies, will be also discussed. Results are illustrated in an application to an Italian Maa network, where the proposed models unveil a structure of the criminal organization mostly hidden to state-of-the-art alternatives routinely used in criminology. I will conclude the talk by introducing an innovative phylogenetic latent space model that learns nested modular hierarchies among criminals from the formation process of the corresponding latent connectivity features.
Session 1
The availability of relational data can offer new insights into the functioning of the economy. Nevertheless, modeling the dynamics in network data with multiple types of relationships is still a challenging issue. Stochastic block models provide a parsimonious and flexible approach to network analysis. We propose a new stochastic block model for multidimensional networks, where layer-specific hidden Markov-chain processes drive the changes in community formation. The changes in the block membership of a node in a given layer may be influenced by its own past membership in other layers. This allows for clustering overlap, clustering decoupling, or more complex relationships between layers, including settings of unidirectional, or bidirectional, non-linear Granger block causality. We address the overparameterization issue of a saturated specification by assuming a Multi-Laplacian prior distribution within a Bayesian framework. Data augmentation and Gibbs sampling are used to make the inference problem more tractable. Through simulations, we show that standard linear models and the pairwise approach are unable to detect block causality in most scenarios. In contrast, our model can recover the true Granger causality structure. As an application to international trade, we show that our model offers a unified framework, encompassing community detection and Gravity equation modeling. We found new evidence of block Granger causality of trade agreements and flows and core-periphery structure in both layers on a large sample of countries.
A new dynamic latent space eigenmodel is proposed for time series of count-valued networks. Bayesian inference is performed exploiting an improved auxiliary mixture sampler that performs two steps of data augmentation and allows the design of conditionally conjugate prior distributions for the model’s parameters. The collection of latent features across all nodes is assumed to evolve over time according to a vector autoregressive process that accounts for possible contemporaneous dependence across nodes and features. A novel MCMC algorithm is designed, which relies on sampling the entire path of the latent variables jointly and without a loop, reducing computing time and enhancing the mixing of the chain. Finally, a novel fully Bayesian method is developed to estimate the latent space dimension relying on a Laplace approximation of a partial marginal likelihood, which avoids trans-dimensional samplers. The resulting algorithm is a very general partially collapsed Gibbs sampler that can be applied to static and dynamic settings, as well as count, categorical and binary observables, under a slight modification.
News outlets are now more than ever incentivized to provide their audience with slanted news, while the intrinsic homophilic nature of online social media may exacerbate polarized opinions. Here, we propose a new dynamic latent space model for time-varying online audience-duplication networks, which exploits social media content to conduct inference on media bias and polarization of news outlets. We contribute to the literature in several directions:
- our model provides a novel measure of media bias that combines information from both network data and text-based indicators;
- we endow our model with Markov-Switching dynamics to capture polarization regimes while maintaining a parsimonious specification;
- we contribute to the literature on the statistical properties of latent space network models.
The proposed model is applied to a set of data on the online activity of national and local news outlets from four European countries in the years 2015 and 2016. We find evidence of a strong positive correlation between our media slant measure and a well-grounded external source of media bias. In addition, we provide insight into the polarization regimes across the four countries considered.
We propose a unified Bayesian framework for sparse factor models that reconciles identifiability, order-invariance, and interpretability in the decomposition of high-dimensional covariance structures. Conventional identification schemes rely on arbitrary variable orderings or restrictive triangular constraints, often compromising exchangeability and obscuring substantive interpretation. Our approach defines an order-independent prior on the factor loadings matrix that ensures identifiability through probabilistic structure rather than fixed parameterization. The prior combines rank selection mechanisms with covariate-induced global-local shrinkage, allowing sparsity and interpretability to emerge from the data and, when available, from exogenous information. This formulation flexibly matches identifiability constraints with the structural regularities informed by domain knowledge, without imposing deterministically. The resulting posterior is identifiable up to signed permutations and supports efficient inference through Gibbs or variational approximations. Simulated and financial applications illustrate how the proposed model recovers coherent, economically meaningful latent structures, providing a principled route to identifiable and interpretable Bayesian factor analysis.
Session 2
This paper tackles the challenge of performing multiple quantile regressions across different quantile levels and the associated problem of controlling the familywise error rate, an issue that is generally overlooked in practice. We propose a multivariate extension of the rank-score test and embed it within a closed-testing procedure to efficiently account for multiple testing. Theoretical foundations and simulation studies demonstrate that our method effectively controls the familywise error rate while achieving higher power than traditional corrections, such as Bonferroni.
Understanding how individuals perceive their living environment is a complex task, as it reflects both personal and contextual determinants. In this paper, we address this task by analyzing the environmental module of the Italian nationwide health surveillance system PASSI (Progressi delle Aziende Sanitarie per la Salute in Italia), integrating it with contextual information at the municipal level, including socio-economic indicators, pollution exposure, and other geographical characteristics. Methodologically, we adopt a penalized semi-parallel cumulative ordinal regression model to analyze how subjective perceptions are shaped by both personal and territorial determinants. The approach balances flexibility and interpretability by allowing both parallel and non-parallel effects while regularizing estimates to address multicollinearity and separation issues. We use the model as an analytical tool to uncover the determinants of positivity and neutrality in environmental perceptions, defined as factors that contribute the most to improving perception or increasing the sense of neutrality. The results are diverse.
- First, results reveal significant heterogeneity across Italian territories, indicating that local characteristics strongly shape environmental perception.
- Second, various individual factors interact with contextual influences to shape perceptions.
- Third, hazardous environmental factors, such as higher PM2.5 levels, appear to be associated with poorer environmental perception, suggesting a tendency among respondents to recognize specific environmental issues.
Overall, the approach demonstrates strong potential for application and provides useful insights for environmental policy planning.
Ordinal categorical data are common in many applied domains. Cumulative link models offer a principled way to model such outcomes, connecting cumulative response probabilities to covariates through a shared linear predictor. Yet, as data size grows, traditional sampling-based Bayesian inference becomes computationally demanding. In this talk, I will present scalable and accurate methods for approximate Bayesian inference in cumulative probit regression models, using Variational Inference and Expectation Propagation. The methods are applied to study the structure of a criminal network through a social-relations regression framework.
To address the common problem of high dimensionality in tensor regressions, we introduce a generalized tensor random projection method that embeds high-dimensional tensor-valued covariates into low-dimensional subspaces with minimal loss of information about the responses. The method is flexible, allowing for tensor-wise, mode-wise, or combined random projections as special cases. A Bayesian inference framework is provided featuring the use of a hierarchical prior distribution and a low-rank representation of the parameter. Strong theoretical support is provided for the concentration properties of the random projection and posterior consistency of the Bayesian inference. An efficient Gibbs sampler is developed to perform inference on the compressed data. To mitigate the sensitivity introduced by random projections, Bayesian model averaging is employed, with normalizing constants estimated using reverse logistic regression. An extensive simulation study is conducted to examine the effects of different tuning parameters. Simulations indicate, and the real data application confirms, that compressed Bayesian tensor regressions can achieve better out-of-sample predictions while significantly reducing computational costs compared to standard Bayesian tensor regressions.
Venue
- Palazzo San Leonardo (room B), Riviera Garibaldi 13, 31100 Treviso (Italy)
- Zoom link to participate online in the seminar
Set in the welcoming atmosphere of Treviso, the event will offer an ideal environment for exchange between senior academics, young researchers, and PhD students.
A hybrid format (in-person and online) will ensure broad accessibility and flexibility for participants and speakers alike.
For those arriving at Treviso train station
It takes about 10 minutes on foot. From the train station, head toward Via Roma, then turn right onto Riviera Santa Margherita and continue until you reach the University Bridge (Ponte dell’Università). The venue is located directly in front of you on the opposite side of the bridge.
For those arriving by car
The University has a reserved car park located close to Palazzo San Paolo. The pass to open the authomatic gate can be asked to the staff of the Campus.