COMPUTATIONAL LINGUISTICS

Academic year
2023/2024 Syllabus of previous years
Official course title
COMPUTATIONAL LINGUISTICS
Course code
LM5860 (AF:459672 AR:250844)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
L-LIN/01
Period
2nd Semester
Course year
1
Moodle
Go to Moodle page
As part of the Language Sciences, Language and Cognition and English Linguistics curricula of the Master's Degree in Language Sciences, and as part of the English and American Literary and Cultural Studies curricula of the Master's Degree in European, American and Postcolonial Languages and Literatures, this course aims at providing students with an understanding of the methodological foundations of Computational Linguistics along with a basic knowledge of the main tools for Natural Language Processing.

The main goals of this course are:

- to provide students with the basic methodological tools to perform linguistic annotation and quantitative analysis of textual data
- to introduce the student to the relevant scientific literature
- to strengthen the student's ability to reflect on the properties of language
- to encourage the student to combine insights and approaches belonging to relatively independent research fields such as theoretical linguistics, computational linguistics and cognitive psychology
- to stimulate critical thinking and the ability to think out of the box
- to practice scientific writing
1. Knowledge and understanding
- familiarity with the basic methods for text processing
- familiarity with the basic terminology and understanding of the relevant scientific literature
- knowledge of the mathematical foundations of Natural Language Processing
- familiarity with the most commonly used techniques of (morphosyntactic) linguistic annotation
- familiarity with the main distributional semantics approaches

2. Applying knowledge and understanding
- knowledge of the features and limitations of the most common computational linguistics tools and approaches, so as to be able to pick the most appropriate solution for a given linguistic research issue
- ability to propose insightful ideas

3. Making judgements
- ability to retrieve the most relevant literature and to use it critically
- ability to select a suitable theoretical framework to answer a research question of interest
- awareness of the technical and deontological issues connected to the automatic treatment of language
- ability to compare competing hypotheses

4. Communication skills
- ability to write an insightful essay on an innovative research topic
- ability to interact with researchers with a different scientific background (among which, computational linguists and cognitive scientists)

5. Learning skills
- ability to learn novel technical tools for the automatic treatment of language (e.g. annotation tools, corpora management and query tools)
Basic notions of general linguistics (morphology and syntax)

Basic mathematical skills
1. Introduction to Computational Linguistics and Natural Language Processing
2. Corpus linguistics: the basics
3. Distributions in text
4. Language and probability
5. Language and probability II
6. Linguistic annotation
7. The annotation process and its evaluation
8. Classification
9. Regular Expressions
10. Computational Lexical Semantics
11. Distributional Semantics: collocations and associations measures
12. Distributional Semantics: semantic similarity and applications
- R. Artstein (2018) Inter-annotator Agreement. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 297-313. Available online at: http://artstein.org/publications/inter-annotator-preprint.pdf
- M. Baroni (2009) Distributions in text. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 803-821. Available online at: http://sslmit.unibo.it/~baroni/publications/hsk_39_dist_rev2.pdf
- M. Davies (2015) Corpora: An introduction. In D. Biber and R. Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics, Cambridge University Press: 11-31.
- S. Evert (2009) Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 1212-1248 (sections 1-4). Available online at: http://www.stefan-evert.de/PUB/Evert2007HSK_extended_manuscript.pdf
- S.T. Gries and A. L. Berez (2017) Linguistic Annotation in/for Corpus Linguistics. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 379-409. Retrieved from: http://www.stgries.info/research/2017_STG-ALB_LingAnnotCorpLing_HbOfLingAnnot.pdf
- S.T. Gries and J. Newman (2010) Creating And Using Corpora. In R. J. Podesva and D. Sharma (eds.), Research Methods in Linguistic, Cambridge University Press: 257-287. Available online at: http://www.stgries.info/research/2013_STG-JN_CreatingUsingCorpora_ResMethLing.pdf
- D. Jurafsky and J. H. Martin (2008) Speech and Language Processing, 2nd edition, Prentice Hall (ch. 1, 2, 4, 19.1-19.4, 20.1, 20.6)
- D. Jurafsky and J. H. Martin (2020) Speech and Language Processing, 3rd edition draft, Prentice Hall (ch. 4). Available online at: https://web.stanford.edu/~jurafsky/slp3/
- A. Lenci (2018) Distributional Models of Word Meaning, Annual Review of Linguistics, 4: 151-171. Available online at: http://colinglab.humnet.unipi.it/wp-content/uploads/2012/12/annurev-linguistics-030514-125254.pdf
- C. Manning and H. Schütze (1999) Foundations of Statistical Natural Language Processing, MIT Press (ch. 1.1-1.3)
- Poesio et al (2018): M.Poesio, J. Chamberlain and U. Kruschwitz (2018) Crowdsourcing. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 277-296
Learning assessment will be based on an oral exam and a final poject (either an essay or a group presentation) on a topic of choice.

THE ORAL EXAM

The oral exam consists of a set of questions aimed to verify students' knowledge of the theoretical issues discussed in class, and exercises to assess students' mastery of the most important methodological constructs addressed in the course (e.g. association measures, FOPL formulas).

THE GROUP PRESENTATION

Students will organize into small groups and prepare a 20-minutes presentation on a Computational Linguistics or Natural Language Processing topic. Students are encouraged to focus on an applicative domain or on a scientific question for which they feel a sincere interest. Note that the specific topic of the project should have been agreed upon with the instructor.

Each group presentation will be graded as follows:

- teamwork: 20% of the final grade
- delivery (verbal and non-verbal skills): 20% of the final grade
- visual aids: 20% of the final grade
- content: 40% of the final grade

THE ESSAY

Students are required to write a 3000+ words essay on a Computational Linguistics or Natural Language Processing topic. Students are encouraged to focus on an applicative domain or on a scientific question for which they feel a sincere interest. Note that the specific topic of the project should have been agreed upon with the instructor. The following resources can be used as a reference list of possible domains or topics:

- R. Mitkov (2023, ed.) The Oxford Handbook of Computational Linguistics, 2nd edition, Oxford University Press.
- A. Clark, C. Fox and S. Lappin (2010, eds.) The Handbook of Computational Linguistics and Natural Language Processing, Wiley Blackwell.

The final essay will be graded as follows:

- mastery of the essay topic and critical use of the relevant literature: 50% of the final grade
- depth of thought: 20% of the final grade
- overall readability of the essay: 30% of the final grade

GRADE BREAKDOWN.

The final grade will be calculated as follows:

- oral exam: 50% of the final grade
- final essay or group presentaiton: 50% of the final grade
Lecture-style presentations
English
oral
Definitive programme.
Last update of the programme: 14/03/2023