LAB OF INFORMATION SYSTEMS AND ANALYTICS
|Academic year||2020/2021 Syllabus of previous years|
|Official course title||LAB OF INFORMATION SYSTEMS AND ANALYTICS|
|Course code||ET7008 (AF:304951 AR:170894)|
|Modality||For teaching methods (in presence/online) please check the timetable|
|Degree level||Bachelor's Degree Programme|
|Educational sector code||INF/01|
|Moodle||Go to Moodle page|
In particular, students should be able to exploit modern AI approaches to extract meaningful information starting from raw data of various kind.
Such competences need a strong theoretical and practical knowledge of data analysis.
The goal of this course is to teach students methods and technologies for effective data analysis, discussing the fundamental techniques for predictive and descriptive analysis of data.
During the lectures several tools and techniques will be presented, from both theoretical and practical aspects, so that students will be able to compare such tools and extract knowledge from the presented datasets.
The results of the aforementioned analysis are exploited as a starting point for further decisions and considerations.
Students should also be able to produce a comparative analysis report, including data representation.
Students will achieve the following learning outcomes, divided in three main areas:
1. Knowledge and understanding:
- understanding the theoretical bases of the main algorithms presented during lectures;
- understanding principles and differences of non-supervised learning algorithms;
- understanding principles and differences of supervised learning algorithms.
2. Applying knowledge and understanding in practical situations:
- being able to apply proper supervised and unsupervised analysis techniques to data;
- being able to use data analysis software tools used during lectures (e.g., scikit-learn);
- being able to compare and correctly interpret different analysis results from different algorithms
- reporting comprehensive comparative analysis among different data analysis methods;
- being able to present results with appropriate figures and diagrams.
2. Similarity Search in Text
- Text representation; Tokenization, Stemming, Lemmatization; Vector space; Similarity measures;
3. Collaborative Filtering ( content-based, item-based)
- Centroid-based clustering; Hierarchical clustering; Agglomerative clustering; Density-based clustering; Quality evaluation;
5. Supervised Learning
- Model training, validation and tuning; Classification; Regression; Feature Engineering; Decision Trees;
6. Ensemble methods
- Bagging and Boosting; Bias vs. Variance trade-off; Over-fitting and Under-fitting; Random Forest
- Lecture notes. Selected readings provided during the course.
The exercises require to apply data analysis methods to a given dataset of limited complexity.
The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report discussing a comparative analysis of the chosen methods.
The following software tools will be used during the course: Jupyter, scikit-learn.