Dreaming of digital tools
During my master thesis, I researched the connections between the Belgium psychiatric association (Société de Médecine Mentale de Belgique) and the societies in other countries in the nineteenth century: analysing how they communicated and discussed psychiatric ideas and practices. During this research I wanted to learn how to apply different kinds of digital techniques that could measure the scope of these international connections, process my sources – in this case psychiatric journals – more quickly; and incorporate a large amount of data to get a better understanding of the transnational contacts that took place in nineteenth century Europe. But because of my lack of time and digital know-how, I was stuck with the hard copies of the Belgium journal.
When the opportunity came along to do a PhD at the University of Luxembourg in an environment that could offer the digital knowledge and tools that I needed, I had the idealized idea that I ‘only’ needed digital copies of all the different journals I wanted to study and that one of the biggest obstacles in my quest to go truly transnational would be solved. Since the growing importance of all things digital, and more specific the growth of Digital Humanities, the possibilities of the ‘digital age’ seem to be endless, and in a way they certainly are, as many examples show us: there have been initiatives to digitise archives and newspapers; to make cultural heritage viewable and searchable online, to create applications and tools to make text searchable; to create interactive maps letting you approach your data in new ways; to communicate historical knowledge via new platforms such as online open access publishing or even via twitter. But the limits and boundaries ‘the digital’ has should also be recognised.
Digital =/= accessibility
Going back to my original goal of using digitised journals, a first step was to find out what was and wasn’t available online. On my quest for sources I came across a digital library full of the journals I needed: of my corpus consisting of approximately 430 issues, 385 were available in a digital format. It looked like I hit the jackpot and an important problem of my PhD was solved. In our digital world we tend to think that sources that are ‘digitally available’ equals ‘digitally accessible’: but as it turned out, this isn’t always the case.
The digital sources I found could only be accessed if you are from a partner institution, and to further complicate things, you can’t access the content from Europe, although this material is (or should be) in the public domain. As Lara Putnam also put forward: digital methods have made it easier to research distant places, people, goods and ideas, but she also stresses that ‘(…) the digitised revolution is not inherently egalitarian, open, or cost-free’, leading to a global disparity in the access of sources, especially when it comes to sources for international or transnational history.1
This obstacle reduced my 385 digitised journals to about 193 issues, which were accessible via different online platforms. But having a digital copy of a source doesn’t mean you can readily use it. To be able to process the digitised material in different programs the text in the sources needs to be recognisable and searchable. And as it turns out the 193 digitally available issues don’t make use of Optical Character Recognition.
And what about all the journals that aren’t available in a digital format? The physical copies of these books are located in different libraries in Belgium, the Netherlands, Germany, France, and the United Kingdom. How do you acquire digital access to these? One way would be to collaborate with the different libraries. As many libraries have digitalisation projects running, it would be a possibility to ask them to scan the material, or to send the materials to be digitised in one location. But as these are nineteenth and twentieth century journals, the transportation of these books might not be possible. Here we arrive at the issue that scanning and OCR’ing 430 journals is a time and money consuming job, and requires the right equipment. These are all factors to take into account to assess whether it is worth the effort of digitising the material.
MediOCRe
A second obstacle is the usability of OCR. It seemed the way to go: it made my texts searchable and translatable, and I could export the documents to different tools and programs such as word clouds, relational databases or topic modelling, methods that have been used successfully by other researchers2. But before considering these digital methods you need to take into account that OCR programs aren’t fail proof because they make mistakes when recognising characters, especially when it comes to historical texts. How do you deal with this? Do you read through every page to see if there are any mistakes in it? Do you use other digital techniques to automatically correct errors? Or do you take the risk of a 20% to 40% error range, hoping for the best? Which way you go has an important impact on the reliability of your sources and the accurateness and trustworthiness of your research results.
As a researcher it is important to be aware of the problems that digital techniques can bring. The digital age hasn’t opened up the access to material as much as we sometimes prefer and often researchers need to undertake several steps before they can truly process and analyse their material. Access and reliability are key factors in historical studies but the digitisation of sources and the creation of digital tools has not necessary made this easier.