Researching the Collecting, Preserving, Analyzing, and Disclosing of Ukrainian Testimonies of the War is a trilateral research project conducted by the Centre of Contemporary and Digital History in Luxembourg, the Institute of Philosophy and Sociology of the Polish Academy of Sciences and the Centre of Urban History in Ukraine. Within U-CORE , the Luxembourg team is leading the development of a digital workflow for potential use by others collecting personal memories during conflict. The aim is to ensure that testimonies are digitally preserved for the future, while at the same time respecting the way eyewitnesses want to digitally disclose their testimonies.
Human Beings as a Valuable Source of Information
Current conflicts around the world and the spread of manipulated or AI generated visual information bring interviewing to the forefront as a method of documenting war through the eyes of witnesses. Being able to rely on firsthand experiences in historiographical research is a well-established and proven method. However, the situation changes when people are speaking about an ongoing war. With their stories, interviewees could potentially create harm for themselves, either now or in the future, depending on how a conflict evolves. How to treat interview data under such circumstances? The U-CORE team is designing a workflow to tackle this challenge.
Securing Collected Information in Three Locations
First, the U-CORE team in Luxembourg proposes the option to differentiate the data and divide it into three categories: personal data of the narrator, a pseudonymized audio testimony file, and a pseudonymization key. This option enables researchers to store every type of information in a different location, so that a potential hacker would need to access three separate servers to connect all the dots. For example, knowing what a person said in a pseudonymized interview won't reveal her or his identity, just as having access to the personal data of a narrator won’t disclose what that narrator said.
Additional Protection Measures
To ensure that the data cannot be leaked before it reaches secure locations, the U-CORE team in Luxembourg proposes interviews to be recorded on an encrypted recorder operated by an interviewer and to be transferred to an external encrypted server. In Luxembourg, we then managed processing within the CatDV Asset Management Platform catalog for audiovisual products. This way of working provides the option to disconnect the audio testimony from the narrator's personal data.
Disclosing Testimonies Available to the Public
The current U-CORE collection composed by team members in Luxembourg, Poland and Ukraine contains over 400 interviews. Whereas the three partners are the owners of their original data, copies of these data are shared in a central digital environment at the University of Luxembourg. That environment includes a unified data model developed with partners under the lead of Dr. Inna Ganschow. The management of the audiovisual files of the central digital environment is handled by multimedia technician Alexandre Germain of the FHSE-Media Centre.
The U-CORE team in Luxembourg is currently experimenting with digital tools to facilitate the future disclosure of certain testimonies from the U-CORE central digital environment. One example is providing the option to connect interview transcriptions with audio files. To simplify navigation within the interview collection, developer Pin Zhu and student assistant Vladyslav Siulhin have worked on the possibility to offer such a connection through the usage of locally used tolls or secured web services. They provide the transcription for full-text research while allowing users to listen to specific segments through synchronized subtitles at the same time.
To that purpose, interview transcriptions need to be foreseen of time stamps. The Luxembourg team uses the Automatic Speech Recognition AI tool HappyScribe, which can operate offline after a data-sharing agreement was signed with the company. HappyScribe delivers the transcription in subtitle format. As the manually composed interview transcriptions from U-CORE partners in Poland and Ukraine arrived in Luxembourg without subtitle format, the U-CORE team in Luxembourg developed a process to add subtitles to interview transcriptions. It was proposed to the U-CORE partners in Ukraine and Poland to use the following algorithm developed to generate timestamps and align them with interview transcriptions:
- Modify the Interview Transcript: Reformat the document so that each sentence starts on a new line.
- Generate a TextGrid File: Use the BAS service to create a '.TextGrid' file from the reformatted text and audio.
- Extract Timecodes: Process the '.TextGrid' file generated in step 2 to produce a '.txt' file with timecodes for each sentence.
- Convert to SRT Format to create subtitles: Transform the '.txt' file from step 3 into a '.srt' subtitle file.
We also anticipate the long-term preservation and disclosure of the testimonies collected within the U-CORE project. To that purpose, the Luxembourg team of U-CORE started a pilot project in cooperation with the Oral History Digital platform of the Free University of Berlin and the Bayerisches Archiv für Sprachsignale.
Within the project, the U-CORE team in Luxembourg will continue to experiment with digital tools to support the analysis, preservation and disclosure of testimonies. We will also discuss and reflect upon the operability of these experimental solutions for the three different collections of the U-CORE project.