INFORMATION RETRIEVAL AND WEB SEARCH
- Academic year
- 2025/2026 Syllabus of previous years
- Official course title
- INFORMATION RETRIEVAL AND WEB SEARCH
- Course code
- CM0473 (AF:576828 AR:323815)
- Teaching language
- English
- Modality
- On campus classes
- ECTS credits
- 6
- Degree level
- Master's Degree Programme (DM270)
- Academic Discipline
- INF/01
- Period
- 2nd Semester
- Course year
- 1
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
The field of Information Retrieval (IR) has changed considerably in the last years with the expansion of the Web (World Wide Web), the born of Web Search Engines, and the advent of data and distributed computing clouds.
During the last decade, relentless optimization of information retrieval efficiency and effectiveness has driven web search engines to new quality levels. The field of IR has thus moved from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access. The course aims at presenting the scientific underpinnings of this field and some practical issue.
In addition, we present techniques and algorithms that fall into the fields of machine learning applied to problems of text mining and ordering of search engine results, and of Web network analysis. Recent developments related to generative AI and Large Language Models (LLM) will be addressed here, up to their application for modern Neural IR, where LLM neural models are used for ranking and retrieval.
Expected learning outcomes
- Knowledge and understanding of the retrieval models, and the methods and indexes for processing queries
- Knowledge and understanding of the components of a search engine, and the techniques and algorithms to get the right compromise between efficiency and effectiveness of the retrieval
- Knowledge and understanding of the methods of analysis of networks, including the Web
- Knowledge of of environments and libraries for large-scale software development, capable of handling and processing large volumes of data
- Knowledge of programming environments and algorithms for Artificial Intelligence
- Knowledge and understanding of the methods of Machine Learning to classify and group texts, and to sort the retrieval results
- Knowledge of the potential ethical, social and legal implications of secure information processing
Applying knowledge and understanding:
- Ability to implement algorithms to index and compress texts and process queries
- Ability to choose and evaluate machine learning methods to classify and cluster text corpora, and to sort the retrieval results
- Ability to identify tools for network analysis, including the Web
- Ability to use advanced programming techniques in the areas of high-performance computing, and algorithms to handle high data volumes
- Ability to verify functional and non-functional requirements of a computer system based on machine learning
- Ability to study scientific literature to identify potential solutions to problems with innovative state-of-the-art methods
Pre-requirements
Machine Learning knowledge and skills
Contents
Text vectorial representation
Basic tokenizing
Indexing, and Implementation of Vector-Space Retrieval
Evaluation of IR Systems
Neural IR
Web Search: Crawling, Link-based algorithms
Scalability issues of IR systems
Referral texts
- Nicola Tonellotto. Neural IR. 2022: https://arxiv.org/pdf/2207.13443.pdf
- Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. Pretrained Transformers for Text Ranking: BERT and Beyond. 2021: https://arxiv.org/pdf/2010.06467.pdf
- Lecture notes and scientific papers.
Assessment methods
The second part of the examination, which contributes to 40% of your final grade, concerns the critical reading and public presentation of scientific articles on course topics. It aims to assess analytical ability and the degree of understanding of the text (range 60%), as well as synthesis and communication skills (range 40%).
The second part of the examination may also be taken by developing a software project whose written report will be discussed orally. In this case, the project will be assessed according to the following scheme: analytical ability of the candidate in tackling the project (range 20%), efficiency of the software project (50 %), completeness of the report and the experimental analysis, as well as communication skills (range 30%).
Type of exam
Grading scale
28-30L: mastery of topics covered in lecture, excellent command of technical terminology, and excellent skills acquired.
26-27: good knowledge of topics covered in lecture, good skills and familiarity with technical terminology.
24-25: not always thorough knowledge of topics covered in lecture, fair skills and not always correct use of technical terminology.
22-23: often superficial knowledge of topics covered in lecture, sufficient acquired skills, deficiencies in technical terminology.
18-21: sometimes lacking knowledge of topics covered in lecture, barely sufficient skills and deficient technical teminology.