Academic year
2017/2018 Syllabus of previous years
Official course title
Course code
CT0428 (AF:212560 AR:97060)
Blended (on campus and online classes)
ECTS credits
Degree level
Bachelor's Degree Programme
Educational sector code
1st Semester
Course year
Go to Moodle page
The course provides an introduction to methods for investigating user's behavior on web sites and social networks. In particular, techniques for content recommendation, similarity quantification and search result ranking will be addressed.
Probability and Statistics
Text representations, information retrieval, similarity functions, dimensional reduction, content based recommender systems, collaborative filters, frequent itemsets mining, Hadoop Distributed FileSystem (HDFS), MapReduce programming model, Apache Spark, cloud computing.
Lab: Python programming language, regular expressions, xpath, libraries for text analytics (GenSim), Apache Spark.
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman.
Mining of massive datasets.
Cambridge University Press, 2014.
Free download: http://www.mmds.org/
(Chapters 1,2,3,6,9,11)

Additional reading:
Aggarwal, Charu C., and ChengXiang Zhai.
Mining text data.
Springer Science & Business Media, 2012.
(Chapter 5)
written and oral
The achievement of the course objectives is assessed through a written examination, a project and a discussion of the project. The exam consists of open questions that allow to test the theoretical knowledge. The projects and their discussions will be used to assess the ability of the students in using the methods presented in the course and implementing tools based on them.

The topics for the projects will be assigned on request and will be partially based on practical exercices proposed during lab classes (for students attending the course). Projects can be completed in small groups, provided that individual contributions can be clearly identified.

Projects will contribute to the final grade (0-6 points) and will be graded based on correctness, efficiency, code documentation and report quality, and capability of the students to discuss the implementation and the relevant theory. In the case of group projects, the grade is individual and also accounts for contribution, either declared or assessed at discussion time, and group size.
Theoretical lectures and practical exercises in the lab, using Python and Apache Spark.
The course is blended. Lab classes can be taken individually using the material which is available online and solution guides will be published a few days after each practical exercices.
  • Lecture notes, material for reference or for self-assessment available online or as e-book
  • E-learning, moodle platforms
  • Use of open-source software
Last update of the programme: 03/07/2017