AUTONOMOUS, DISTRIBUTED AND PERVASIVE SYSTEMS-1

Academic year
2023/2024 Syllabus of previous years
Official course title
AUTONOMOUS, DISTRIBUTED AND PERVASIVE SYSTEMS-1
Course code
PHD156-1 (AF:471254 AR:258248)
Modality
On campus classes
ECTS credits
2
Degree level
Corso di Dottorato (D.M.45)
Educational sector code
INF/01
Period
Annual
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The term "big data" indicates data so large that traditional algorithmic techniques cannot cope with neither storing nor analyzing it. Very often, however, the sheer size of such datasets is not proportional to their effective information content. Think about DNA sequencing: one human genome requires approximately 1 GB to be stored, and advanced techniques allow to sequence dozens of genomes in few days. Clearly, classic algorithmic techniques do not scale: for instance, all Italian genomes would require approximately 60 Petabytes just to be stored. Any two human genomes, however, are 99.99% similar: the key is compression. The real problem is: can we develop algorithms that operate directly on compressed data (without first decompressing it)?

The course tackles the problem of representing and manipulating big data through the use of compressed data structures. This research direction merges techniques from algorithms, data structures, and information theory in order to obtain structures able to, simultaneously, accelerate operations typical of information retrieval while occupying a space proportional to the compressed data (often, thousands of times smaller than the original datasets).
At the end of the course, the student will be able to:
1) understand the main lossless compression techniques used to represent unstructured (plain text) and structured (e.g. trees, graphs) data.
2) understand the relation between compression and computation, and how this can be exploited to accelerate operations on compressed data.
3) implement basic compressed data structures
basic knowledge about:
- algorithms and data structures (e.g. sorting, hashing, binary search, big-O notation)
- probability theory
1) Entropy, Shannon's theorem, prefix-free codes, compression
2) Bitvectors with rank/select/access, wavelet trees, geometric data structures
3) Compressed suffix arrays, FM-index, Burrows-Wheeler transform
4) Indexes based on Lempel-Ziv compressors
- Navarro, Gonzalo. Compact data structures: A practical approach. Cambridge University Press, 2016.
- original research articles
One of the following:
- discussion of an existing article in the field
- implementation of a compressed index in C++
- original research (proposal/implementation of a new technique that may lead to an original publication)
Frontal lectures, slides, blackboard.
English
Course contents may still vary.
oral
Definitive programme.
Last update of the programme: 25/02/2023