|Academic year||2018/2019 Syllabus of previous years|
|Official course title||WEB INTELLIGENCE|
|Course code||CT0428 (AF:230276 AR:111552)|
|Modality||On campus classes|
|Degree level||Bachelor's Degree Programme|
|Educational sector code||INF/01|
|Spazio Moodle||Link allo spazio del corso|
The goal of this course is to enable students the understand and exploit predictive data analysis techniques including both supervised methods (classification and regression) and un-supervised methods (clustering and recommendation), with focus on web data (e.g., text documents, web graph). The course includes the exploitation of data mining software tools through the python programming language.
Students will achieve the following learning outcomes:
Knowledge and understanding: i) understanding principles of non-supervised learning; ii) understanding principles of supervised learning; iii) understanding principled of web content mining.
Applying knowledge and understanding: i) being able to apply supervised and unsupervised analysis techniques; ii) being able to use data analysis software tools (e.g., scikit-learn).
Making judgements: i) being able to choose the most appropriate to a given problem and to evaluate its performance.
Communication: i) reporting comprehensive comparative analysis among different data analysis methods
(even without passing the corresponding exams).
- Similarity search in text:
- Text processing: tokenization, stemming, lemmatization, stopwords
- Similarity functions: Jaccard, Euclidean, Cosine
- Advanced Similarity approximations: k-shingles, Locality-Sensitive Hashing, Sim-Hashing
- Web Mining - Recommender systems:
- Content-based, Collaborative Filtering, user-based and item-based
- Dimensionality Reduction:
- Distance measures, curse of dimensionality, PCA
- k-means, k-medoids, Hierarchical, DB-Scan
- Intrinsic and extrinsic Evaluation
- Classification and Regression:
- k-NN, Naive Bayes, Decision Trees
- Ensemble methods: Bagging, Boosting, Random Forests
- Bias and Variance, overfitting and underfitting
- Imbalanced data
- Evaluation: accuracy measures, cross-validation
- Web Mining - Document Ranking:
- classification and regression for document ranking
- Graph Analysis with PageRank
Selected readings provided during the course.
The written exam consists in questions and short exercise regarding the theory of the subjects discussed during the course. The written exam evaluates the theoretical knowledge gained by the student.
The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report, to be discussed with the teacher. The project work evaluates the ability of the student in applying the theoretical knowledge to a real-world case study.
Teaching material is delivered through the Moodle platform.
During the course, the python programming language is used together with the scikit-learn library. Students are encouraged to bring their own laptops.