Open Archives, Repositories, Digital libraries
BASE - Bielefeld Academic Search Engine. It is one of the world's most voluminous search engines, especially for scientific and academic open access web resources. It is operated by Bielefeld University Library. BASE provides more than 30 million documents from more than 2,000 sources (journal websites, institutional archives, etc.), (June 2012) You can access the full texts of about 75% of the indexed documents. The Index is continuously enhanced by integrating further OAI sources as well as local sources. BASE can be also used as a repertoir of repositories: click here to see the complete list of sources indexed by BASE.You can order them by name, country, number of items.
DSpace.org. It is an open source software that preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets. Dspace has been developed in 2000 by MIT (Massachusetts Institute of Technology) and HP (Hewlett-Packard) Laboratories, in Cambridge, Massachusetts. As of today (June 2012) 1313 institutions and universities around the world use this software. The entire list of Dspace users it’s available here.
Openarchives.eu. It is the European guide to OAI-PMH compliant digital repositories in the world, it is a searchable index of the repositories. The search engine and other editorial contents that complete the original database have been developed by Horizons Unlimited srl (Bologna – Italy).This portal uses repositories and collection descriptions harvested from the University of Illinois OAI-PMH Data Provider Registry. You can obtain the full list of repositories indexed (3.946 in June 2012) by clicking the ok button on the string search.
OpenDOAR - Directory of Open Access Repositories. It is an authoritative directory of academic open access repositories. Born as a project of the University of Nottingham, it is now developed also by contributions of Open Societies Institute (OSI), Joint Information Systems Committee (JISC), Consortium of Research Libraries (CURL) and SPARCEurope. Today it lists 2167 repositories (June 2012) and allows breakdown and selection by various criteria (country, language, subject, data typology, software used). As well as providing a simple repository list, OpenDOAR lets you search for repositories or repository contents.
PLEIADI - Portale per la Letteratura scientifica Elettronica Italiana su Archivi aperti e Depositi Istituzionali. It is a project originated from the cooperation between two important italian university consortia: CASPUR and CILEA. It aims to build a national platform that offers centralized access to the scientific literature by 56 Italian archives (click here to see the list). As of today it contains 508,717 items (18,369 dissertations and 36,843 thesis ) (May 2012).
ROAR - Registry of Open Access Repositories. Hosted by the University of Southampton, UK, this database is part of the EPrints.org. network. The aim of ROAR is to promote the development of open access by providing timely information about the growth and status of repositories throughout the world. The 2914 repositories (June 2012) may be browsed by software typology, country, archive typology (department based, used by different institutions, e-journals, thesis, databases, learning objects, etc.), year. Every archive has a detailed index card, with the HP and graphics that allow visualizing the input in time.
University of Illinois OAI - PMH Data Provider Registry. The University of Illinois since 2004 collects the list of OAI-PMH repositories in the world. As of today more than 2935 have been collected (see here the full list).
BASE - Bielefeld Academic Search Engine. It is one of the world's most voluminous search engines, especially for scientific and academic open access web resources. It is operated by Bielefeld University Library. BASE provides more than 30 million documents from more than 2,000 sources (journal websites, institutional archives, etc.), (June 2012) You can access the full texts of about 75% of the indexed documents.
DMoz - Directory Mozilla. It is known also as Open Directory Project (ODP), is the most famous web directory in the world, a multilingual repertoire of websites (over 5 millions in June 2012) indexed by category and subcategory. Every category is selected and maintained by a virtual community of volunteer editors. DMoz provide the cataloguing of full text resources, thus representing a potential project for a digital cross-curricular library. In the science category, by clicking on social sciences, you can find the Linguistics section (4322 links), divided in different subcategories (Languages, Applied Linguistics, Bilingualism, Comparative Linguistics and Typology, etc.).
DRIVER - Digital Repository Infrastructure Vision for European Research. It is a portal realized by European Commission and other 13 partners (ckick here to see the list). It indexes over 5.170.000 (June 2012) scientific publications (articles, books, thesis, reports, etc.), permitting the full text access, when available. Currently there are 335 repositories from 45 countries. To see the list click on "Repositories" in the search criteria on the left of the homepage.
ERIC - Educational Resources Information Center. It is an online digital library of education research and information. It is sponsored by the Institute of Education Science (IES) of the U.S. Department of Education. It provides ready access to education literature to support the use of educational research and information to improve practice in learning, teaching, educational decision-making, and research. It contains more than 1,4 million (June 2012) bibliographic records and abstracts (or full text when available) of different material (articles, books, thesis, teaching materials, technical reports, etc.), especially US and international publications in English. It contains for example the articles from the RIE-Resources in education review, and CUE, Current Index to Journals in Education. It contains links to articles and documents from scientific journals of the LIS (Library and Information Science) and ET (Educational Technology).
NARCIS - National Academic Research and Collaborations Information System. It was created in 2004 by KNAW (Royal Dutch Academic of Arts and Sciences). NARCIS provides access to scientific information, including open access publications from the repositories of all the Dutch universities and institutions. Currently (June 2012), the archives stores more than 680.000 documents, and about 43,000 digital thesis. Click here and on "overview of repositories" to consult the list of contributor .
OAIster. It is a union catalog founded in 2002 by University of Michigan. It represents open access digital resources that was built by harvesting from open access collections worldwide using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Today (June 2012), OAIster includes more than 25 million records representing digital resources from more than 1,100 contributors (click here to see the list). Additionally, the OAIster records are included in search results for those libraries with WorldCat Local and WorldCat Local "quick start."
OAPEN - Open Access Publishing in European Networks. is a collaborative initiative to develop and implement a sustainable Open Access publication model for academic books in the Humanities and Social Sciences. The OAPEN Library aims to improve the visibility and usability of high quality academic research by aggregating peer reviewed Open Access publications from across Europe. Click here to see the list of partner.
OpenAIRE - Open Access Infrastructure for Research in Europe. In august 2008 the European Union launched the Open Access pilot in FP7 (Seventh Research Framework Programme), in which open access to articles resulting from research funded in areas participating in the pilot (Energy, Environment, Health, Information and Communication Technologuies, e-Infrastructure, Socio-economic and Humanistic Sciences, Science and Society) should be provided within 36 months. OpenAIRE aims to establish and operate an electronic infrastructure for handling peer-reviewed articles as well as other forms of publications. This is achieved through a portal that is the gateway to all user-level services offered by the e-Infrastructure of 27 countries, including access (search and browse) to scientific publications and other value-added functionality. Currently (June 2012) in OpenAire are stored 27,180 publications, 9,834 of them are open access.
Scientific Commons. It is the name of a project, an organization and a website, that provide free access to the contents of scientific research, published by Universities and Institutions around the world. It stores data found in over 1,200 repositories. The project is developed by the Institut für Medien und Kommunikationsmanagement of St.Gallen University in Switzerland. Currently (June 2012), the search engine indexes 38,354,162 scientific documents (for some of them ispossible to directly download the full text) from 1,269 repositories (click here to see the list).
DOAJ - Directory of Open Access Journals. It is a search engine which aims to be comprehensive and cover all open access scientific and scholarly journals, with full text articles. It uses a quality control system to guarantee the content, edited by the Lund University Library (Sweden). Currently (August 2011) it collects 6,736 journals (more than 3,100 searchable also by article), of 17 disciplinary fields. In the "Languages and literature" category (256 titles) are indexed journals with "Linguistics" as subject (172 titles).
OJS - Journal List (Public Knowledge Project). It is not a comprehensive list of journals using OJS (Open Journal Systems) software, it is used by institutions that aim for open access (more than 7,500 titles). It continues to be an active player in the open access movement, as it provides the leading open source software for journal and conference management and publishing.
(Click here to see the main national theses repositories)
DIVA Portal - Digitala Vetenskapliga Arkivet. Developed by the ECP at the Uppsala University Library, it is a finding tool for research publications and student theses (more than 88,000) written in 30 Scandinavian universities and colleges of higher education (click here to see the list). The Academic Archive Online (Digitala Vetenskapliga Arkivet in Swedish) offers both publishing services and technical solutions for local repositories. The participating universities publishes and archives full text documents through DiVA. The archive contains mainly doctoral and undergraduate theses and research reports. "Advanced search" in the student theses catalogue lets the user search by university, subject/course, educational program, subject category and thesis level. Some DiVA repositories are also members of the Networked Digital Library of Theses and Dissertations (NDLTD).
ETDs Worldwide. It is a global index of freely available electronic theses and dissertations (more than 30,000), provided by major ETD collections around the world (click here to see the complete list).The research can be activate by author, title, abstract, subject, keyword, school/publisher. It does not search the full text of the ETDs. However the user can refine his/her search by selecting the limiters on the right window of the result page (degrees, levels, universities....).
Linguist List Dissertation Abstracts. It is a database operated by the Institute for Language Information and Technology (ILIT) at Eastern Michigan University. It contains more than 2,100 theses and abstracts of doctoral dissertations, which are directly added by the authors themselves to the Linguist List database. It is possible to search by author, year, title, institution, linguistic field, or language specialty.
Networked Digital Library of Theses and Dissertations (Scirus). It is the world's largest theses and dissertations portale, it contains more than 400,000 electronic theses and dissertations, from over 70 institutes worldwide that are currently made available via NDLTD. Networked Digital Library of Theses and Dissertations is an international organization dedicated to promoting the adoption, creation, use, dissemination, and preservation of electronic theses and dissertations (ETDs). Click here to see the list of the members.
PhdData.org. The Universal Index of Doctoral Dissertations in Progress is the product of the combined initiative and efforts of several doctoral students in the U.S.A., Argentina and Israel, who felt the need for one site that would concentrate all existing information on doctoral researches around the world. The database offers doctoral students and their instructor information on all researches in-progress, thus enabling contact between students and researchers for academic purposes (for example each bibliographic record contains a link to write to the author) and preventing duplication of work currently being conducted. It also facilitates exposure of doctoral theses to professional journals, conference organizers and various research institutions, and facilitates the interaction between the world of research and application. There are over 3,600 theses registered at the moment (May 2012).
Proquest Dissertations and Theses (PQDT) Open. It is a database which provides the full text of open access dissertations and theses. The authors of these dissertations and theses have opted to publish as open access and make their research available for free on the website ProQuest's UMI Dissertation Publishing. The world’s most comprehensive collection of dissertation and theses Proquest Dissertations and Thesis - Full text includes 2.7 million searchable citations to dissertations and theses from around the world (more than 700 leading academic institutions) from 1861 (simple bibliographic citation are also available for dissertations dating from 1637) to the present day together with 1.2 million full text dissertations that are available for download in PDF format. This service is not free of charge. Dissertation Express is another chargeable service offered by ProQuest's UMI (University Microform Inc.). It allows institutions and individuals to order and receive dissertations (in PDF or hard copy formats) selected from over two million titles available from ProQuest.
AILLA - The Archive of the Indigenous Languages of Latin America. It is a digital archive of recordings (narrations, ceremonies, conversations, music, singings, etc.) and texts (grammars, dictionaries, etc.) in and about the indigenous languages of Latin America, from Rio Bravo (USA-Mexico border) to Chile, Caribbean included. The user will have to register and login in order to access any archive resource, but it is possible to browse the catalog information without registering. It is a joint project of the Departments of Anthropology and Linguistics and Digital Library Services division of the General Libraries of University of Texas at Austin.
CogPrint - Cognitive Sciences Eprint Archive. Created in 1997 by Stevan Harnad, founder of the Behavioral and Brain Sciences Journal. CogPrints is an open access electronic archive for self-archive papers in any area of Psychology, Neuroscience, and Linguistics, and many areas of Computer Science, Philosophy , Biology as well as any other portions of the physical, social and mathematical sciences that are pertinent to the study of cognition. This site is powered by Eprints3, free software developed by Department of Electronics and Computer Science of the University of Southampton. Currently (June 2012) it indexes 3,965 full text document (354 of Linguistics ).
ELAR - Endangered Languages Archive. It is a digital repository for documentation of endangered languages, promoted by Hans Rausing Endangered Languages Project and School of Oriental and African Studies, University of London. It aims to collect, preserve and spread materials (text and media files) about endangered languages, currently (June 2012) it indexes 25,354 resources. To use it you must first apply for a user account, the access to each data is defined according to your status (student, researcher, etc.).
Infothèque francophone: ressources en ligne et actualité scientifiques francophones. It is the website of the Agence universitaire de la Francophonie (AUF) . Its main objective is to help francophone students to easily find information that may be useful to review their progress, to self-train or to carry out their research. Catalogue de ressources scientifiques francophone includes full text of pedagogical and scientific resources (about research and education). In the "Catalogue de ressources", the "Sciences du langage" section contains (June 2012) 230 full text resources (Généralités (30) - Langage et communication, sémiologie, sémantique (62) - Ethnolinguistique, acquisition du langage (47) - Sociolinguistique, politiques linguistiques (19), etc.).
LDH - Language Description Heritage. The goal of the LDH Open Access Digital Library is to provide easy access to descriptive material about the world’s languages. This collection is administered by the Max Planck Society in Germany as an open access digital repository of existing scientific contributions (mostly PhD thesis) describing the world-wide linguistic diversity. Research criteria enable the user to select family of languages (e.g: African languages) or single languages (e.g: Dutch language), identified also by ISO code (grammar and syntactical aspects).
LingBuzz. It is an openly accessible repository of scholarly papers, discussions and other documents for Generative Linguistics. Edited by Michal Starke, hosted by CASTL - Center for Advanced Study in Theoretical Linguistics of University of Tromsø. It harvests academic papers, articles, discussions, and other documents (3,358 full text documents, June 2012). The contents are organized by subject: Semantic, Syntax, Morphology and Phonology .
LOT Archive - Landelijke Onderzoekschool Taalwetenschap. Institutional archive of Netherlands Graduate Schools of Linguistics, University of Utrecht. It unites about 400 faculty and 150 PhD students. It provides open access to volumes of two publishing lines: Dissertation series (since 1998) and Occasional series (since 2003).
Max Planck Institute for Psycholinguistics. Annual reports. Archive of scientific publications available in full text since 1997, edited by the famous Dutch institute of psycholinguistic.
OLAC - Language Resource Catalog. It is a catalog developed by Open Language Archive Community, a meta archive edited by an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources. Text, recordings, vocabularies, learning material, software, archives, metadata, web indexes are instruments used to enhance language documentation, description and evolution. Currently (June 2012) OLAC collects 43 archives, click here to see the list.
OTA - The Oxford Text Archive. Edited by Oxford University Computing Services (OUCS), it collects, develops, preserves and catalogues electronic literary and linguistic resources ( in 25 languages) for use in Higher Education, in research, teaching and learning. It comprehends TEI texts (2,721 text creating following the guidelines of the Text Encoding Initiative), Corpora (67 collections of language data) and Legacy Formats (1,273 various resourceswhich have been collected since the OTA comes to existence in 1976). The materials are mostly open access (to consult protected resources, is compulsory to send a request). Click here to see the projects in which OTA is involved.
ROA - Rutgers Optimality Archive. It is a website edited by School of Arts & Sciences, Rutgers University, that is both a repository (scholars can freely upload their contributions) and a distribution point for information and research in Optimality Theory.
SemanticsArchive.net. It is an open archive edited by Chris Barker and Peter Lasersohn. It harvests documents about semantics of natural language and language philosophy, and allows to upload, research and exchange scientific contributions. The website contains additional resources (bibliographies, links, congress details, etc.).
Sociolinguistic Archive. It is an open archive of sociolinguistics developed by the Department of Language and Linguistic Science at the University of York. You can make a research by author, title and keyword. Available documents: 32 (June 2012).
SLAAP - Sociolinguistic Archive and Analysis Project. Developed by North Carolina State University, it is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. The archive currently (June 2012) contains over 2,600 interviews, over 4,100 audio files (over 2,100 hours of audio) from a variety of languages (predominately American dialects in North Carolina and the southeastern United States).
SALA - Southeast Asian Linguistics Archives. It collects, scans, indexes, disseminates, and analyzes the relationships between scholarly publications on Southeast Asian language and linguistics. The most part of the recorded materials (journals’ articles, conference proceedings, working papers, thesis, reports, etc.) are available full text, in pdf format. SALA includes also a rich bibliography of SEA linguistics.
It is not easy to define the concept of Digital Library. The term is commonly used to describe a great variety of projects, services and products. Here the term is limited to the meaning of digital documents collection with online access developed by: Institutions (e.g.national libraries); Volunteers associations around the world that share a particular cultural interest (e.g.Scandinavian literature in the Runenberg project or Jewish literature of the Ben-Yehuda project); Profit organizations (e.g. Google, Microsoft, Yahoo). We cannot avoid to consider the role and weight that they are assuming inside the digitization of world cultural heritage (Google Books, Internet Archive). This subdivision, that is just illustrative, is for the most part geographic in nature. It aims also to differentiate the national projects from the ones beginning to realize a “Universal Digital Library”.
Europeana. Created in November of 2008, it is a digital library promoted by European Union. It collects contributions from over 1500 cultural institutions (museum, libraries, archives, media collections, e.g: the British Library, Bibliothèque Nationale de France, Amsterdam Rijksmuseum, Paris Louvre) of the 27 states members. Currently (July 2011) the archive stores over 15 million digital objects, images (pictures, drawing, maps, photos), texts (books, newspapers, diaries, letters), audio files (music, records, broadcasts), video (movies, news, tv shows).
Gallica. It is the digital library of the Bibliothèque nationale de France. Currently (June 2012) it stores over 1,7 million documents, of which 350,000 books, 42,000 maps, 26,000 manuscripts, 489,000 images, 846,000 periodicals and over 2,000 sound recordings. Gallica offers traditional search criteria (author/title/date/subject/typology), but the user is also enabled to make a reserch by theme or feature collections. From 2008 Gallica offers access to electronic publications available in editorial market. This service is not free.
Bayerische StaatsBibliothek Digital Collections.The Münchener DigitalisierungsZentrum (MDZ) handles the digitization and online publication of the cultural heritage preserved by the Bavarian State Library and by other institutions. It provides over 800,000 documents (June 2012). The digitization policy reflects the traditional special collection fields of the library: History, Classical Antiquity, Eastern Europe, Musicology. It comprises manuscripts, early prints, modern books, maps and photographic collections as well as journals and newspapers. Recently, the BS comes to an agreement with Google books for the digitization of 1 million books that are not protected by copyright.
See the website Michael - Multilingual Inventory of Cultural Heritage in Europe to have a detailed list of digitization projects in progress in Italian museums, archives and libraries (3818 digital collections, 1812 institutions).
Internet culturale. It is the portal (8,5 million digital files: books, maps, newspaper, letters, videos, photos, musical sheets, etc.) of the cultural heritage stored in Italian libraries, edited by ICCU (Istituto Centrale per il Catalogo Unico). It allows also the access to the National Librarian Service Catalogue (11 million bibliographic records by 4,600 libraries that have joined the SBN network). The user can access various digital collections. The full list is available here.
Progetto Manuzio. It is a digital library with open access. It was founded in 1993 by the cultural association Liber Liber. It is entirely charged by 1,000 volunteers, and collects over 1000 digitized books (essays, Italian and foreign literature classics, dissertations, some of them are protected by copyright, for this reason they are available since authors and publishing houses allow them to be. It is also available a musical archive (Liber Musica) and audio books of the Libro Parlato project.
Go to the official page of Biblioteca Nacional de España to see the full list of digital spanish libraries, .
Project Runeberg It is an effort supported by various volunteers over the world, based in Linköping University (Sweden). It aims to povide free access to the classics of Scandinavian literature in electronic format, free of copyright. Currently (June 2012) 1,423 titles are available .
Aozora Bunko. It is a Japanese digital library born in 1997. Based on project Gutenberg, aims to make available online and free of charge (in PDF or HTML) Japanese and English literature works that are not protected by copyright.
Project Ben-Yehuda It is a project supported by volunteers, it aims to make available online Jewish literature classics (not protected by copyright).
Google Books. It is a Google application that allows to search, view a preview and access (fully or partially) the contents of millions of volumes by individuals, publishers (over 20,000) and public or academic libraries, (for example in Europe: the Bayerische Staatsbibliothek in Monaco, Biblioteca de Catalunya, Biblioteca Complutense in Madrid, Oxford Bodleian Library, Biblioteche nazionali centrali di Firenze e Roma). There are three typology of documents in Google research books: books protected by copyright: you can see only the bibliographic records and some passage of the text containing key words; books not anymore protected by copyright laws (you can read and download the full text); books protected by copyright but out of print (if the publisher/author and google manage to reach an agreement they can be digitized and make available for buying). Google books provides also links to libraries and bookstores where you can buy or borrow the book you are looking for. The user can also create a personal library in which store, judge and review favorite texts.
Hathi Trust. Created in 2008, it is a digital preservation repository and highly functional access platform. It provides long-term preservation and access service for public domain and in copyright content from a variety of sources, including Google, Internet Archive, Microsoft and in-house partner institution initiative (click here to see the list). Currently (June 2012) it harvests over 10 million digitized volumes in various languages.
Internet Archive. It is a non-profit U.S. organization that aims to build a digital library of internet sites and other cultural artifacts in digital form. It provides free access to researchers, historians, scholars and the general public. Founded in 1996 in San Francisco it includes texts, audio, moving images, and software as well as archived web pages. On the texts section are collected more than 3,4 millions downloadable items (June 2012) from U.S. (for example Library of Congress, full list here) and Canadian libraries (full list here), from Gutenberg, Million Books Project (a project in which contribute various Chinese and Indian libraries, e.g: Peking University Library, Zhejiang University Library, Indian Institute of Information Technology Allahabad), and from other collections ( for example 4,000 texts come from the Japanese digital library, Aozora Bunko). The purpose of Open Library (project of IA) is to display a webpage for every book ever published in the world. Currently (June 2012) it harvests more than 23 millions works (over 1 million free e-book titles and over 250,000 modern works available for th U.S. print-disabled community). The bibliographic metadata coming from libraries and publishers have been associated, providing the full text links when available. The user can contribute new information or corrections to the catalog and can also browse by subject, author or lists members have created.
Project Gutenberg. Project Gutenberg is the first and largest single collection of free electronic books, or eBooks. Created by Michael Hart in 1971, it aims to build a library of electronic version of published books, available for free. The first goal is to digitize the main literature or historical works (the project slogan is “break down the bars of ignorance and illiteracy”). Volunteers provide the digitization of texts that for the most part are not protected by copyright. Currently (June2012) PG offers over 39,000 free eBooks, in ePub format, Kindle format, HTML or text format.
Wikisource. It is an online library of free content (it stores texts that are not protected by copyright) publications, collected and maintained by Wikimedia foundation. The texts archived are concerned with different period, subjects and typologies (novels, essays, letters, historical documents, laws, etc.). Currently (June 2012) it stores 782,608 texts.
World Digital Library. It is a multilingual digital library promoted by UNESCO and developed by the Library of Congress. It aims to promote international and intercultural understandings, expand the volume and variety of cultural content on the Internet, provide resources for educators, scholars, and general audiences and build capacity in partner institutions to narrow the digital divide within and between countries. Over 100 institutions from different countries (see the full list here) are involved in the project. The site harvests a large collection of manuscripts, rare books, images, video and recordings that testify world’s different cultures. Currently (June 2012) the WDL collects 5990 items that the user can browse by different criteria: place, nation, time, topic, type of item and contributing institution. Navigation tools and content descriptions are provided in 7 languages: English, Chinese, Arabic, French, Russian, Spanish and Portuguese (many more languages are represented in the actual texts, which are provided in their original languages).