STATISTICAL INFERENCE AND LEARNING

Academic year
2026/2027 Syllabus of previous years
Official course title
STATISTICAL INFERENCE AND LEARNING
Course code
CM0471 (AF:577103 AR:323997)
Teaching language
English
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Academic Discipline
SECS-S/01
Period
1st Semester
Course year
2
Where
VENEZIA
The Statistical Inference and Learning course is an elective course in the Master's degree in Computer Science and Information Technology. It provides the student with advanced tools for statistical data analysis with a strong computational orientation. The objective is to develop the skills necessary to address inference and prediction problems in the most varied technological-scientific fields, with particular emphasis on the algorithmic implementation of the techniques studied.
Attendance and participation in the training activities proposed by the course and individual study will allow students to:
1. knowledge and understanding:
- know and understand the fundamental principles of statistical inference in a frequentist framework and from a Bayesian perspective
- know and understand advanced methods of statistical learning of information for prediction
2. ability to apply knowledge and understanding:
- autonomously implement statistical inference methods for estimation and learning of information
- apply statistical simulation techniques (bootstrap, jackknife, Monte Carlo methods) for inference and model evaluation
- independently use the R language to analyse datasets, including high-dimensional ones
3. judgement ability;
- express autonomous evaluations regarding the validity and feasibility of different statistical techniques and understand their impact on the results of the analyses
- consciously choose between frequentist and Bayesian approaches depending on the problem and the data available
Students are assumed to have basic competencies in descriptive statistics and probability obtained during the bachelor thesis. In particular, it is important that students have full familiarity with the basic concepts of probability calculus and random variables. If needed, a review of the first two chapters of Bontempi G., Statistical Foundations of Machine Learning: The Handbook, Machine Learning Group, ULB, is recommended.
The course program of the Statistical Inference and Learning course includes the presentation and discussion of the following topics:
1. Principles of statistical inference — point and interval estimation from a frequentist perspective
2. Hypothesis testing — classical hypothesis tests, type of error control in multiple tests
3. Linear regression — estimation, inference, diagnostics and computational implementation
4. Non-parametric methods — random forests, gradient boosting and other non-parametric estimation methods
5. Simulation in statistical inference — bootstrap, jackknife and Monte Carlo methods
6. Elements of Bayesian statistics — basic concepts; prior distribution, likelihood, posterior distribution; estimation and diagnostics

The course approach is strongly computational: the techniques are implemented in R (www.r-project.org) from first principles, limiting the use of pre-built packages with the aim of programming the methods ad hoc using R and RStudio.
• Bontempi G., Statistical Foundations of Machine Learning: The Handbook, Machine Learning Group, ULB (https://dipot.ulb.ac.be/dspace/bitstream/2013/325210/3/syl.pdf ) — main reference text for topics 1–5
• James G., Witten D., Hastie T., Tibshirani R. (2023), An Introduction to Statistical Learning, 2nd ed., Springer (https://www.statlearning.com/ ) — reference for error control in hypothesis testing with multiple comparisons
• Gelman A. et al. (2013), Bayesian Data Analysis, 3rd ed., Chapman & Hall (https://sites.stat.columbia.edu/gelman/book/BDA3.pdf ) — reference for Bayesian statistics (topic 6)
• Additional readings and supplementary materials distributed during the course via the Moodle platform
Assessment is based on a written exam consisting of two parts. Each part comprises two theoretical/methodological questions and one practical exercise. The total duration of the exam is 2 hours. Each student will analyze a dataset provided by the instructor and answer questions of a methodological nature.
The assessment is designed to measure:
- (1) knowledge of the theoretical content of the course
- (2) quality and correctness of the statistical analyses performed
- (3) appropriate use of technical terminology
- (4) correctness and consistency of the conclusions drawn

The maximum score for each part is 16 points and the final score is the sum of the scores obtained in the two parts. To pass the exam, a score of at least 9 points must be achieved in each part. If the first part is not passed, the second part will not be graded. A total score above 30 points corresponds to the highest distinction (cum laude).
During the exam, students are permitted to use a formula sheet provided by the instructor and the R/RStudio software. The exam is closed-book: the use of textbooks, notes, or any other reference material is not permitted.
An oral examination may be required to confirm the final grade.

Important:
A midterm test will be held halfway through the course, corresponding to the first part of the exam (two theoretical/methodological questions and one practical exercise). If the midterm is passed with a score of at least 9 points, the student may sit only the second part of the exam at the first exam session exclusively. In this case, the final score will be the sum of the midterm score and the score obtained in the second part of the exam at the first session.
written

The lecturer has a duty to ensure that the rules regarding the authenticity and originality of exam tests and papers are respected. Therefore, if there is suspicion of irregular conduct, an additional assessment may be conducted, which could differ from the original exam description.

The exam result is graded as follows:
- satisfactory (18–22 points), if the student demonstrates an adequate knowledge and understanding of the course methods, is able to apply and interpret them appropriately, and uses technical terminology correctly;
- fair (23–25 points), if the student shows a good knowledge and understanding of the course methods, applies and interprets them convincingly, and uses technical terminology with reasonable accuracy;
- good (26–28 points), if the student possesses a solid knowledge and understanding of the course methods, applies and interprets them in a fully convincing manner, and uses technical terminology accurately;
- excellent (29–30 points), if the student demonstrates an excellent knowledge and understanding of the course methods, applies and interprets them in an outstanding manner, and uses technical terminology with a very high level of accuracy.

Honors are awarded to students who, in addition to achieving an excellent result, demonstrate exceptional commitment in carrying out and presenting the project, providing original contributions or ideas.
Conventional theoretical lectures complemented by exercises, discussion of case studies and computer labs.
Teaching material prepared by the teacher will be distributed during the course through the Moodle platform.
The statistical software used in the course is R (www.r-project.org).
Definitive programme.
Last update of the programme: 16/04/2026