COMPUTATIONAL PHILOLOGY: DATA STRUCTURES AND ALGORITHMS

Academic year
2021/2022 Syllabus of previous years
Official course title
COMPUTATIONAL PHILOLOGY: DATA STRUCTURES AND ALGORITHMS
Course code
FM0488 (AF:338877 AR:190760)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
L-LIN/01
Period
4th Term
Course year
2
Where
VENEZIA
Moodle
Go to Moodle page
This course will demonstrate an approach to philological research that is positioned at the intersection between textual sciences and computational methods, and places at its core the textual data themselves, by revisiting both their mode of production (*computational ecdotics*) and their analytic process (*computational analysis*).

The classes will alternate between two main areas.

1. Computational ecdotics

This part will deal with the different aspects of scholarly editing, revisited with artificial intelligence and computational methods, from the digital facsimile to the establishment of a genealogy of the witnesses and the critical edition itself. It will introduce to the use of machine learning algorithms and neural networks for handwritten text recognition, linguistic annotation and lemmatisation, and to methods for computer-assisted collation and stemmatology.

2. Computational Analysis and Stylometry

This second part of the class will focus on computational analysis of textual data, with a particular emphasis on stylometry and authorship attribution. It will start with general considerations on the quantification of textual features, and particularly, individual traits, and will showcase a full range of analysis methods, suitable for a variety of contexts and goals (anonymous texts, disputed authorship between a closed or open set of candidates, authorship verification, etc.), including descriptive and exploratory statistics (dimensionality reduction, etc.), clustering (e.g. hierarchical clustering) and supervised methods (SVM and neural networks).
At the end of this class, students are expected to be able to perform stylometric analysis, especially for authorship attribution, as well as applying essential tools and algorithms for the production of textual corpora and digital editions (handwritten text recognition, linguistic annotation, computer assisted collation, etc.).
There are no formal prerequisites for this class, but notions of philology, as well as basics of programming (in particular with R and/or Python) will be useful.
- Introduction to computational philology

Computational Ecdotics

- Text acquisition and handwritten text recognition
- Normalization and linguistic annotation
- Alignment and collation
- Stemmatology

Stylometry

- Exploring an anonymous corpus and dealing with controversial attributions.
- Recognizing an author's touch
- Profiling
- Opening the box.
Stylometry

- CAFIERO (Florian), CAMPS (Jean-Baptiste), Affaires de style, Paris, 2022.
- JUOLA (Patrick), Authorship Attribution, Delft, 2008.
- KARSDORP (Folgert), KESTEMONT (Mike), and RIDDELL (Allen), Humanities Data Analysis: Case Studies with Python, 2021.

Computational Ecdotics

- ANDREWS (Tara), “The third way: philology and critical edition for a digital age”, Variants: the Journal of the European Society for Textual Scholarship, 10 (2012), URL: http://boris.unibe.ch/43071/ .
- CAMPS (Jean-Baptiste), « La philologie computationnelle à l’École des chartes : premier bilan et perspectives » (to be published).
- CAMPS (Jean-Baptiste), ING (Lucence), and SPADINI (Elena), “Collating Medieval Vernacular Texts. Aligning Witnesses, Classifying Variants”, in DH2019 Digital Humanities Conference 2019.
The class will be evaluated through a real-life case study in authorship attribution. Teams of students will have to perform various tasks to help identify the author of a disputed text. If conclusive, the results could even be presented in a conference.
The class will contain a mixture of theoretical and practical developments, with presentation by the teacher, as well as hands on practice on computer, using the adequate tools and programming languages. Some classes will leave room for presentations of their results byt the students themselves.
English
written and oral
Definitive programme.
Last update of the programme: 02/02/2022