PREDICTIVE ANALYTICS

Academic year
2022/2023 Syllabus of previous years
Official course title
ANALISI PREDITTIVA
Course code
CT0429 (AF:339919 AR:180748)
Modality
On campus classes
ECTS credits
6
Degree level
Bachelor's Degree Programme
Educational sector code
SECS-S/01
Period
1st Semester
Course year
3
Where
VENEZIA
Moodle
Go to Moodle page
This course belongs to the interdisciplinary educational activities of the Data Science curriculum of the Bachelor in Informatics. The course is designed to give a panoramic view of several tools available for predictive modelling, at an intermediate level.
The course covers the main concepts in linear models and generalized linear models and possibly further extension of these modelling frameworks. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of statistical software. This is achieved by a mixture of theory and reproducible code. Real data examples and case studies are also introduced.
* General competences

Identify the most appropriate data analysis techniques for each problem and know how to apply the techniques for the analysis, design and solution of the problems.
Apply data processing techniques to real data of (possible) large size using suitable statistical software
Be able to generate new ideas (creativity) and anticipate new situations, in the contexts of data analysis and decision making.

* Specific competences

Use classic results of inference and regression as a basis for advanced methods of prediction and classification.
Be able to interpret and present the results of a statistical analysis.
Identify and select the appropriate software tools for the treatment of data.
Correctly identify the type of statistical problem corresponding to certain objectives and data, as well as the most appropriate methodologies to apply to the given objectives and data.
Know how to design specific data processing systems for a type of statistical problem (classification, estimation, prediction, etc.)
Use linear algebra knowledge for its application in methods for analysing data.
Students are assumed to have reached the learning objectives of the courses
Calculus 1
Calculus 2
Algebra
Probability and Statistics
Data Analysis
although it is not formally required to have passed the examination.
1. Introduction
1.1 Course overview
1.2 What is predictive modeling?
1.3 General notation and background

2. Linear models I: simple and multiple linear model
2.1 Model formulation and least squares
2.2 Assumptions of the model
2.3 Inference for model parameters
2.4 Prediction
2.5 ANOVA
2.6 Model fit

3. Linear models II: model selection, extensions, and diagnostics
3.1 Model selection
3.2 Use of qualitative predictors
3.3 Nonlinear relationships
3.4 Model diagnostics
3.5 Potential critical issues in regression models

4. Generalized linear models
4.1 Model formulation and estimation
4.2 Inference for model parameters
4.3 Prediction
4.4 Deviance
4.5 Model selection
4.6 Model diagnostics

The program might be slightly modified during the semester. Students are encouraged to actively request for the course to also cover specific statistical questions of interest.
Julian J. Faraway, 2014. Linear Models with R Second Edition, Chapman and Hall/CRC
Julian J. Faraway, 2016. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition Chapman and Hall/CRC
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer
The exam will take place in the IT lab and is composed of parts: a written part and an R-based part. Both parts will be made of exercises which aim to evaluate
1. the theoretical knowledge of the course topics,
2. the ability to apply them for solving real data problems.
3. the ability to use R and interpret its output to solve real data problems.
The lessons consist on a mixture of theory (methods description) and practice (implementation and practical usage of methods). The implementation of the methods is done with the statistical language R. Students are encouraged to bring their own laptops and to have a first hand experience with the code during some parts of the lessons.
Italian
written
Definitive programme.
Last update of the programme: 06/05/2022