Since the introduction of Transkribus, a platform for digitization and transcription of historical documents, several projects working with collections of handwritten personal correspondence from before, during, and after World War II have emerged. In these projects, digitization includes scanning, transcribing, annotating, and curating digital versions of historical ‘egodocuments’. These different transformative processes come, however, with a series of problems, challenges, and questions. In January 2022, Nina Janz (Luxembourg Centre for Contemporary and Digital History, C²DH) and Milan van Lange (NIOD Institute for War, Holocaust, and Genocide Studies) brought together various projects to identify these issues. In this blogpost, Janz and Van Lange reflect on this first step in building a community of interested researchers and archivists around problem-solving approaches to these issues.
With the first online edition of the ‘War Letters & Transkribus’ workshop series, held on the 12th of January 2022, we intended to start building a community around shared issues and challenges. We brought together a group of interested and enthusiastic archivists, historical researchers, and information specialists working on or with digitized 20th-century documents – in particular, personal correspondence written before, during, or just after times of war and conflict ('war letters'). There were, amongst others, representatives present of the ongoing DIGIKÄKI project (Tampere University), Warlux (C²DH), the Witnesses and Contemporaries-project in the Independence, Decolonisation, Violence and War in Indonesia, 1945-1950-research programme (KITLV, NIHM, and NIOD), and First-Hand Accounts of War (1935-1950) (NIOD). By identifying shared issues, we aspired to stimulate further collaboration and work towards organizing a follow-up workshop together to discuss problem-solving ideas and their potential implications for historical research.
Whether the participants work in Luxembourg, Finland, or the Netherlands, they are all involved in future and existing projects working on the digitization of historical personal documents using READ-COOP's Transkribus software program. This means, roughly, the participants are connected as they all take part in the transformative processes of turning heterogeneous historical documents into structured and enriched digitized datasets. During the workshop, everyone was asked to present project (plans), collections, and source materials. In addition, the most prominent issues, problems, and challenges in each project were discussed. What did the participants bring to the workshop?
Relying on Transkribus
One of the uniting principles between the different projects is that they all make use of Transkribus. This software platform is becoming more and more popular nowadays, as it offers a variety of possibilities, such as manual tagging and annotating and facilitates transcription procedures using both crowdsourcing and automated transcriptions (using Handwritten Text Recognition technology, HTR). The outcomes of HTR not only provides archivists and researchers with full-text searchable archival sources, it also allows for the application of innovative computer-assisted approaches to the study of historical egodocuments. In addition, Transkribus also offers opportunities for online presentation. During the workshop, we wanted to investigate how Transkribus is used in the different war letter-related projects. We came to the conclusion that despite its rather predefined workflows, people use the software in quite different ways and in different stages of the projects.
Extracting meaningful information
Another important issue that came to the surface dealt with the so-called metadata (data about the data) of the egodocuments. What kind of information can be derived from these sources? What additional information is available for tagging or annotation? And how to keep the layers of complexity and versatility of historical material in a digitized context? As an example, the importance of the ‘notes in the margin’ was highlighted: the additions or second thoughts, often scribbled in the margins by letter writers, recipients, or readers in later times. How can such information be extracted from digitized documents, how to ‘tag’ this in these documents and include them in digital datasets? This was, however, not the only ‘food for thought’ that was brought to the virtual table during the workshop.
Letter-writing in multiple languages
Another interesting ‘family’ of issues that came to the surface is that of multilinguality. Especially war letters from different countries with multilingual regions, such as Finland or Luxembourg, often contain many different languages. In addition, particularly in World War II, millions of people were forced into migration or displacement. As a result, sometimes even the language in which a single letter was written varies throughout the text! Correspondence with and about international organizations, such as the League of Nations or the Red Cross contributed to the emergence of this issue. Multilinguality of source (collections) poses many new challenges for generating usable digital transcriptions of scanned war letters. The issue displays how historical context defines technical problems in digitization projects working with historical materials.
‘Silence’ in the letters
That these projects not only have to deal with what is actually written in the war letters, but also with what is lacking, was stressed by the participants during a ‘collaborative writing session’. How can the digitization of letters help point us to where ‘silences’ might be located? Can ‘silence’ also be grasped from historical letters? It seems straightforward that the contents of the war letters are interesting to archivists and historians, but a significant consideration for the participants was the ‘silence’ in the letters - that what remained unsaid. Letters from the front, for example, rarely include detailed facts about killings and violence; soldiers wrote harmless letters to their mothers (‘all is fine, I am doing well’) not to scare loved ones at home. Another essential condition is censorship, not only in the military but also in the occupational administrational and authoritarian regimes.
Continuing the conversation
The lively conversations during this workshop displayed, in the first place, how rich, versatile, and varied the challenges, issues, and approaches to problem-solving can be in the context of digitizing historical collections of personal documents, such as war letters. Despite the different national and international contexts, rather diverging collections and research interests, we discovered a strong common ground based on problems and challenges on which we aim to build a collaborative working group.
NIOD and C²DH are organizing a follow-up workshop on the 31st of May 2022. Please reach out to one of them if you are interested in participation or staying informed about the ‘War Letters & Transkribus’ community.