HEALTH DATA SCIENCE

Academic year
2024/2025 Syllabus of previous years
Official course title
HEALTH DATA SCIENCE
Course code
EM1413 (AF:506439 AR:293001)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
SECS-S/05
Period
4th Term
Course year
1
Where
VENEZIA
This course aims to offer some scenario elements for one of the main applications of data science: health data.

Thus, the most common statistical tools for dealing with health data will be examined:

- statistical methods for analyzing categorical data
- statistical methods for analyzing healthcare costs
- survival models.

Statistical techniques will be explored theoretically and practically through laboratory activities (in R, analyzing real data from different health surveillance systems).
- Knowledge of information systems for health systems.

- Analysis skills in applying, comparing, and interpreting statistical methods for analyzing health data (theoretically and practically in R).
Having attended at least the Statistical Learning for Data Science course. Some concepts from the Statistical Learning for Data Science course will be covered in the first week, but a good knowledge of R and Rmarkdown is necessary to take the course properly.
1. Introduction to health data:
- Type of health data (rif. Etzioni)
- Survey: American case study Behavioral Risk Factor Surveillance System (BRFSS), and Italian one Progressi delle Aziende Sanitarie per la Salute in Italia (PASSI).
- Health data and health policies

2. Statistical models for health data:

2.a: Categorical data analysis (ref. Agresti):
- Analyzing contingency tables and comparing proportions. Relative risk, odds ratio, and chi-squared test of independence
- Logistic regression. Interpretation, evaluation, and selection. Categorical predictors and aggregated data.
- Poisson and negative binomial regression, Interpretation, evaluation, and selection.
- Multi-category logit models (for nominal and ordinal data). Interpretation, evaluation, and selection.
- Generalized linear models. How to define logistic regression as a generalized linear model.
- Generalized linear mixed models (logistic-normal model) Interpretation, evaluation, and selection.

2.b: Health care costs (rif. Etzioni):
- Log cost models and the lognormal distribution
- Gamma models for right-skewed cost outcomes
- Mixture models
- Other models for skewed data

2.c: Modelling survival data (ref. Collett):
- Introduction to survival analysis.
- Survival function, Hazard function, censoring.
- Proportional hazard models, Cox regression (brief reference)

3. Health data analysis lab

- Case studies and practical applications with R.
Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons. (ch 2, 3, 4, 6, 10)
Collett, D. (2015). Modelling survival data in medical research. CRC press. (ch 2, 3)
Etzioni, R. (2020). Statistics for Health Data Science (ch 1, 6)
1. Attending students: Group project consisting of analyzing a dataset and writing a report + classroom presentation of the results obtained. The report must include at least the motivation of the analysis, an exploratory analysis of the data, and a statistical model. All must be documented, interpreted, and well-explained.

2. No-attending students: the exam consists of answering 4-5 questions regarding a dataset (the questions are similar to the examples given during the in-class lab). The exam is open-book. You may use R and Rmarkdown or any other software throughout the exam. For students who wish to raise their grade on the written exam, an oral exam can be taken on the day of the exam correction.
Frontal lectures (theory and practice in R)
English
written and oral
This programme is provisional and there could still be changes in its contents.
Last update of the programme: 21/03/2024