STATISTICAL LEARNING FOR DATA SCIENCE - 2

Academic year
2022/2023 Syllabus of previous years
Official course title
STATISTICAL LEARNING FOR DATA SCIENCE - 2
Course code
EM1401 (AF:383278 AR:208302)
Modality
On campus classes
ECTS credits
6 out of 12 of STATISTICAL LEARNING FOR DATA SCIENCE
Degree level
Master's Degree Programme (DM270)
Educational sector code
SECS-S/01
Period
2nd Term
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The objective of the course is to develop statistical skills for the analysis of high dimensional data and solve forecasting and classification problems occurring in a wide variety fields including business, economics and technology.
Regular and active participation in the teaching activities offered by the course and in
independent research activities will enable students to:
1. (knowledge and understanding)
- acquire knowledge and understanding regarding advanced statistical learning methods for synthesis, prediction and classification using data also in the presence of complex structures and high-dimensionality
2. (applying knowledge and understanding)
- pre-process a dataset and prepare it for further analysis
- apply autonomously advanced statistical methods for synthesizing information, make predictions and classifications using high-dimensional data
- apply autonomously statistical software for the analysis of high-dimensional data
3. (making judgements)
- make autonomous judgements about the validity and feasibility of different statistical techniques and understand the effects of these on the outcomes of the analyses
- present the results in a clear and concise manner, using tools for reproducible reports and
The course will make use of basic mathematical and statistical concepts such as functions, integrals, derivatives, matrices, distributions, estimation and hypothesis testing. Students are expected to possess knowledge of statistics at STAT-100 level.
The course is divided into two parts. The first part focuses on introducing tools for
reproducible research. Such tools will be applied in the second part of the course about statistical learning.



Tools for data science and reproducible research
- Introduction to R and Rstudio
- Writing reports using Rmarkdown

Data wrangling, data tyding, data visualization

Statistical Inference
- Sampling
- Estimation
- Hypothesis testing

Statistical learning
- Linear regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Nonlinear models
James G, Witten D, Hastie T, Tibshirani R (2015). An Introduction to Statistical Learning. 6th version. Springer. Webpage http://www-bcf.usc.edu/~gareth/ISL/ Chapters 1-7
Chester Ismay, Albert Y. Kim (2019) Statistical Inference via Data Science: A ModernDive into R and the tidyverse! , CRC Press ( https://moderndive.com/ )
Yihui Xie (2019) bookdown: Authoring Books and Technical Documents with R Markdown, CRC/Press ( https://bookdown.org/yihui/bookdown/ )
The final written test contains 3 exercises designed to measure
1. your theoretical knowledge of the topics covered in the course
2. your ability to apply the methods you have learnt to solve real problems.

The maximum mark for the written examination is 33 points. The use of books, notes and electronic resources is not permitted during the final examination. Only material found on Moodle may be consulted.

The final mark is the sum of the marks obtained in the papers and the mark for the final examination. A mark of 33 or above will be awarded with honours.
The course consists of a combination of conventional theoretical classes focused on description of methods and practice sessions describing the implementation and application of the methods to real problems. Methods will be implemented with the statistical language R ( www.r-project.org ). Students are encouraged to bring their own laptops (no tablets!) and to experiment with the code during the course. 
English
This is the second module of a 12 credit course. The information refers to the whole course.

Students should register in the related course web page of the university e-learning platform moodle.unive.it
written
Definitive programme.
Last update of the programme: 11/02/2023