ADVANCED COMPUTER SCIENCE - MOD. 2
- Academic year
- 2025/2026 Syllabus of previous years
- Official course title
- ADVANCED COMPUTER SCIENCE - MOD. 2
- Course code
- CM0604 (AF:577049 AR:323780)
- Teaching language
- English
- Modality
- On campus classes
- ECTS credits
- 6 out of 12 of ADVANCED COMPUTER SCIENCE
- Degree level
- Master's Degree Programme (DM270)
- Academic Discipline
- ING-INF/05
- Period
- 2nd Semester
- Course year
- 1
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
Expected learning outcomes
The student:
- knows the methodologies for designing and developing computer systems related to Data Engineering;
- knows the techniques for evaluating the performance, scalability, and reliability of software and algorithms for analyzing massive data;
- develops his/her skills in the field of programming by learning calculation techniques and knowledge of algorithms, at the state of the art;
- knows programming environments and algorithms for analyzing massive data;
Ability to apply knowledge and understanding
The graduate:
- is able to design and develop systems for storing, managing, and analyzing large-scale data;
- is able to design and evaluate highly scalable systems;
- is able to use advanced programming techniques in the fields of high-performance computing and algorithms for analyzing large amounts of data;
- is able to verify the functional and non-functional requirements (performance) of an algorithm;
- is able to access scientific literature to identify potential solutions to problems with innovative state-of-the-art methods.
Judgment skills:
At the end of the course the student will be able to use the knowledge acquired to:
- Identify the algorithm and the data structure best suited to solve a given problem in the context of massive data.
- Rigorously analyze the performance of randomized algorithms (runtime, approximation, probability of success).
- Independently read and understand research articles in the field.
- Implement existing algorithms and design new ones.
Pre-requirements
- Probability theory (topics covered in CT0675)
- Algorithms and data structures (topics covered in CT0667)
- programming, preferably in C/C++ (topics covered in CT0665 and CT0664)
- Discrete mathematics (topics covered in CT0434)
Contents
- Course intro. Basics of information theory (Worst-case Entropy, statistical entropy, data compression)
- compressed data structures for sets and strings (sorted integers, Elias-Fano, zero-order compressed bitvectors, wavelet trees)
- Introduction to indexing, compressed suffix array
- FM-index
(2) lossy randomized algorithms and data structures
2.1 Probability theory recap
- Probability theory, basic definitions, concentration bounds
- Hashing
2.2 Filters
- Bloom filters, counting Bloom filters
- Quotient filters
2.3 Similarity-preserving sketching
- Rabin hashing, Shingling
- MinHash (Jaccard distance), Min-wise permutations.
- locality-sensitive hashing
- nearest neighbor search
2.4 Pattern matching on streams
- Pattern matching & streaming: applications, Karp-Rabin algorithm
- Porat-Porat's algorithm
- extension to approximate pattern matching (under Hamming distance)
2.5 Sketching on streams
- Estimating frequencies on a stream: sampling, count-min sketch, Misra-Gries sketch
- Estimating distinct elements (zeroth frequency moment): Flajolet-Martin sketch
- Estimating number of events (first frequency moment): Morris sketch
- Estimating the second frequency moment: Tug-of-war sketch
- Applications: dimensionality reduction, sketches for relational algebra
Referral texts
- Original research articles provided during the course
Assessment methods
1. Individual assignment. The maximum score that can be obtained with the assignment is 15. The student should identify an interesting problem within the context of massive data processing (ideally using real-world data) and solve it by implementing techniques discussed in class or found in original research articles. The methods and results should be summarized in a pdf report. Topics must align with the course focus: compressed data structures, filters, or sketching/streaming algorithms. Any programming language can be used, although C/C++/rust (and similar) are recommended for efficiency.
2. Individual oral exam. The oral exam must be taken after the delivery of the individual assignment. The maximum score that can be obtained with the oral exam is 20. The aim of the oral exam is to verify the knowledge of the theoretical tools presented during the course, as well as to discuss the solution of the individual assignment.
Type of exam
Grading scale
- accuracy of the solution, use of computational resources, quality of the report (range 80%)
- originality of the solution (range 20%).
Oral exam (maximum 20 points, passing mark: 12/20):
- knowledge (statements and proofs of the theoretical results) and ability to apply knowledge in the solution of exercises (range 40%)
- detail and completeness of the answers (range 40%)
- communication skills, including the use of specific terminology related to algorithms and data structures for massive data (range 20%).
Both parts (assignment and oral exam) must be passed to pass the exam. The final evaluation is given by the sum of the evaluations of the assignment and the oral exam. Honors (laude) will be awarded to students who have obtained a sum of at least 34 points in the two tests.