DATA AND WEB MINING

Academic year: 2020/2021 Syllabus of previous years

Official course title: DATA AND WEB MINING

Course code: CT0509 (AF:337525 AR:178728)

Teaching language: Italian

Modality: On campus classes

ECTS credits: 6

Degree level: Bachelor's Degree Programme

Academic Discipline: INF/01

Period: 1st Semester

Course year: 3

Moodle: Go to Moodle page

Contribution of the course to the overall degree programme goals

This course is part of the educational activities of the Bachelor in Informatics.
The goal of this course is to enable students the understand and exploit predictive data analysis techniques including both supervised methods (classification and regression) and un-supervised methods (clustering and recommendation), also including web data (e.g., text documents, web graph). The course includes the exploitation of data mining software tools through the python programming language.

Expected learning outcomes

The course discusses fundamental technique for predictive and descriptive data analysis, with focus on Web data.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of non-supervised learning; ii) understanding principles of supervised learning; iii) understanding principled of web content mining.

Applying knowledge and understanding: i) being able to apply supervised and unsupervised analysis techniques; ii) being able to use data analysis software tools (e.g., scikit-learn).

Making judgements: i) being able to choose the most appropriate method to a given problem and to evaluate its performance.

Communication: i) reporting comprehensive comparative analysis among different data analysis methods

Pre-requirements

Students should have achieved the learning outcomes of courses "Programming", "Probability and Statistics", "Linear Algebra"
(even without passing the corresponding exams).

- Knowledge Discovery in Databases
- Similarity search in text:
- Text processing: tokenization, stemming, lemmatization, stopwords
- Similarity functions: Jaccard, Euclidean, Cosine
- Advanced Similarity approximations: k-shingles, min-hashing
- Advanced Similarity approximations: Locality-Sensitive Hashing, Sim-Hashing
- Web Mining - Recommender systems:
- Content-based, Collaborative Filtering, user-based and item-based
- Dimensionality Reduction:
- Distance measures, curse of dimensionality, PCA
- Clustering:
- k-means, k-medoids, Hierarchical, DB-Scan
- Intrinsic and extrinsic Evaluation
- Classification and Regression:
- k-NN, Decision Trees
- Bias and Variance, overfitting and underfitting
- Ensemble methods: Bagging, Boosting, Random Forests
- Random Forests for feature selection, outlier detection
- Imbalanced data
- Evaluation: accuracy measures, cross-validation

Referral texts

Lecture notes. Selected readings provided during the course.

- Data Mining Concepts and Techniques Third Edition. Jiawei Han, Micheline Kamber Jian Pei. Morgan Kaufmann/Elsevier. Third Edition. 2012.
- Web Data Mining 2nd edition. Liu. Springer. 2011.

Assessment methods

Learning outcomes are verified by a written exam and a project.

The written exam consists in questions and short exercise regarding the theory of the subjects discussed during the course. The written exam evaluates the theoretical knowledge gained by the student.

The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report, to be discussed with the teacher. The project work evaluates the ability of the student in applying the theoretical knowledge to a real-world case study.

Type of exam

written and oral

Teaching methods

Lessons include both theoretical and practical sessions.
Teaching material is delivered through the Moodle platform.
During the course, the python programming language is used together with the scikit-learn library. Students are encouraged to bring their own laptops.

Definitive programme.

Last update of the programme: 27/04/2020

Type	Name	Sender (Domain)	Description	Duration	Policy
Essential	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	They maintain the session data of the SingleSignOn.	session	Information by Ca' Foscari University
Essential	PHPSESSID	Unive.it (www.unive.it)	Unique user identifier for the website applications.	session	Information by Ca' Foscari University
Essential	cookie[*]	Unive.it (www.unive.it)	It stores the user's preferences on cookies. user preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	cookie	idp.unive.it	It stores the user's preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	fe_typo_user	Unive.it (www.unive.it)	Unique user identifier for the reserved area of the website	session	Information by Ca' Foscari University
Essential	JSESSIONID	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	ADMCMD_prev	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	unive.it	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	noiframe	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	_pk_id[*]	unive/WAI	*	30 days	Information by Matomo
Essential	_pk_ses[*]	unive/WAI	*	1 day	Information by Matomo
Essential	_pk_ref[*]	unive/WAI	*	6 months	Information by Matomo
Essential	_gsas[*]	unive/google	It stores the user's preferences on cookies.	3 months	Information by Google
Essential	_opensaml_req_cookie%[*]	unive	Authentication and SingleSignOn (shibboleth)	session	Information by Ca' Foscari University
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Google - Youtube	CONSENT	Google (google.com)	Used by Google to store the user's preferences.	17 years	Information by Google
Google - Youtube	__Secure-1PSID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Essential	Socialpix	Unive.it (www.unive.it)	They are used to record cookie preferences	6 months	Information by Ca' Foscari University
Facebook - Pixel	_fbp	Unive.it (www.unive.it)	Tracks users for retargeting advertising on Facebook	3 months	Information by Facebook
Facebook - Pixel	datr	Facebook	Marketing	2 anni	Information by Facebook