Histoire contemporaine du Luxembourg

Digitising, georeferencing and modeling administrative historical data

Digitising, georeferencing and modeling administrative historical data

Source: luxatlas.lu

Report of the workshop hosted at the C²DH on 19 October 2023.

On 19 October 2023, the research area ‘Contemporary history of Luxembourg’ of the Centre of Contemporary and Digital History hosted the workshop ‘Digitising, georeferencing and modeling administrative historical data’ at the university campus in Belval. Scholars from the Netherlands, Italy, France, Germany, Luxembourg and Belgium were asked to describe the source corpus of their research, show and explain the digital environment in which they work, present the digital workflow they follow and reflect on the potential reusability of their data in further research.  

***

Sonia Schifano from Bocconi University introduced the audience to her ongoing research comparing urban with rural land distribution in pre-industrial Luxembourg using data from the Marie Theresa cadaster of 1766. The data were collected after a major land reform aimed to create a more egalitarian society by demanding the nobility to pay tax on land. Comparing the declarations on the distribution of land net revenues filed for the municipalities of Luxembourg and Dudelange revealed that the latter was poorer and less egalitarian. Dr. Schifano looks forward to collecting the complete series of microfilms of the 1766 Luxembourgish Maria Theresa cadastre, as well as to compare land inequality among different countries for a given year using the same source.

Rombert Stapel and Thomas Vermaut from the International Institute of Social History in Amsterdam presented the HisGIS 1832 Project (www.hisgis.nl). Next to quantitative measurements detailing out the worth of pieces of land, the source also includes qualitative information about costs for maintenance and the attractiveness of a municipality based on its access to water. They explained the audience how they digitised, vectorised, and modelled the Napoleonic cadastral maps and tables for the Netherlands and guarantee accessibility of the digitised data for a wider public as well as researchers. The researchers cooperate with Allmaps.org for the development of an open standard IIIF-based georeferencer and encourage citizen scientists to hand-draw vectorisations using OpenStreetMap. The data model includes documentation of the valuation, which the computer checks for coherence, thanks to which the citizen science component of the project has become an almost entirely self-supporting structure. The goal of the project is to create contiguous coverage for the Netherlands through the creation of 17.000 maps. Some maps have already been used by municipalities, as they must provide information about how likely it is owners will find archeological findings on pieces of land they purchase.

Tiago Ferreira and Antoine Paccoud of the Luxembourg Institute of Socio-Economic Research (LISER) have gathered ample expertise in the analysis of 19th and 20th century cadastres detailing land ownership in the Luxembourgish municipality of Dudelange. They are now cross-referencing their data with genealogical sources, census data as well as information shedding light on past population management to research land wealth and its transmission. Their analysis indicates a certain democratisation of property ownership. Whereas in 1872, one third of owners bought their houses and two third inherited their dwellings, at the verge of the Second World War, 80 percent of owners had acquired their property. However, the dominance of town owners and their families is consistent over a span of almost two centuries, with 60 percent of the land today still being owned by 10 families. The researchers plot cadastral maps from the early 19th Century on contemporary maps to unravel the evolution of property ownership and the increase of urbanisation. Potential questions they aim to answer in the near future is how many households are owners of their houses, as well as how the path to house ownership looks like for incoming migrants.

Steve Kass and Martin Uhrmacher of the Historical Institute of the University of Luxembourg provided the audience with an update of their work on the Luxembourg historical atlas, inspired by the European Town Atlas initiative (https://www.luxatlas.lu/). They zoomed in on the challenges they encounter while preparing digital maps on the level of georeferencing buildings and standardising HTA geodata. The Luxatlas also includes story maps and digital walking tours through Luxembourg city (www.mapping-luxembourg.lu). A next development could be to expand the historical atlas with 3D images of the façade plans of the city fortifications from the early 19th Century.

After the lunch break, Marijke Van Faassen and Rik Hoekstra of the Huygens Institute in Amsterdam continued with a joint presentation titled: ‘Places in registrations: making sense of variation’, in which they reflected on their research projects MIGRANT and REPUBLIC respectively. In the first project, to link and give meaning to the data included in more than 50 000 digitised registration cards of Dutch migrants to Australia in the aftermath of the Second World War, today scattered throughout Dutch and Australian physical archives, the scholars performed various digital experiments, such as edge detection, Card Model, Scratching and various visualisations. Whereas they could extract, for example, the information density per province of provenance and reveal that more Dutch emigrants came from North and South Holland, they remain aware of the importance to interpret their findings against the backdrop of the changing administrative practices in reporting migration.

Rik Hoekstra introduced the audience to the disclosure of the digitised daily meetings of the States General of the Dutch Republic from the 16th until the 18th Century by means of handwritten text recognition, optical character recognition and named entity recognition with a large language model (Flair/GysBERT). After a description of the steps and procedures of the workflow, he gave detailed instructions on how hand-tagging and computer-tagging need to go hand in hand to achieve the best possible result. He also offered an insight into the most common locations which can be found in the digital data, how their popularity changed over the years, and formulated suggestions on how the variation in named locations can be standardised to optimise digital transversal searches. 

Estelle Bunout and Machteld Venken from the Centre of Contemporary and Digital History at the University of Luxembourg reported about the ongoing development of a digital pipeline for researching migration to and in Luxembourg. They also evaluated the tools they are using: Nodegoat and Transkribus. Estelle Bunout presented the digitization pipeline designed to transcribe, extract and connect data from administrative archive of migration history, developed on the basis of the arrival declarations stored in the city of Dudelange. These files contain information on civil status and migration paths of the ten years preceding the arrival of migrants in Dudelange. The archive has been digitized in the context of a partnership with the city of Dudelange, a crucial centre of the industrial history of Luxembourg and is being used in a teaching of migration history. Using the field detection and OCR/HTR tools provided by Transkribus (https://transkribus.ai), one can reproduce the structure of administrative forms when transcribing, which is key to scale up the analysis of such serial archival collections.

Machteld Venken reflected about an exercise she undertook with David Jacquet to unravel remigration, a practice that was not documented as such in the past, within a collection of declaration forms of arrival of migrants to the Luxembourgish municipality of Dudelange. Through playful tinkering, the team could distil a definition of remigration that matches the mobility practices of remigrating foreigners arriving in Dudelange. These individuals had arrived in Luxembourg and the Minett as foreigners, declared residence in Luxembourg and the Minett, left Luxembourg and the Minett and then returned after at least 14 days to Luxembourg and the Minett, at least once. Using this definition enabled them to identify the two periods of time generating most outmigration: the First World War and the years 1923 and 1924 and compare similarities and differences in remigration patterns. During the exercise, the team acquired new empirical knowledge on how to read, evaluate and adapt an already existing Nodegoat database (https://www.frontiersin.org/articles/10.3389/fhumd.2022.931758/full), as well as how to let the data speak in tandem with other digital tools, such as MySQL and Excel. This knowledge will be fruitful when adapting the data model both to include more data from different sources in the database and to answer new research questions.

Later, Lorella Viola from the Centre of Contemporary and Digital History at the University of Luxembourg presented DeXTER, a reusable, interoperable workflow for accessing narratives of Italian transatlantic migration, 1898-1936. She shared her experience in creating from scratch a collection of digitised newspapers and the main enrichments brought to it, namely: geographic entity recognitions, geo-coding, sentiment analysis and topic modelling and network analysis. All the pre-processing steps have consequences on the end results, and she insisted on the importance of documenting these necessary content reductions, as part of the “critical engagement” with digitised historical material.

***

The workshop showcased the innovative approaches to digitising and analysing historical data, demonstrating the potential of digital tools in enhancing our understanding of the past. All these projects, beyond the idiosyncrasies of their source materials, share common challenges: facing sizeable collections, the use of automated transcription, tagging and annotation can open up new interaction with historical source material, but researchers need to develop special strategies to mitigate ambiguities, and noisy results. The emphasis on collaborative efforts, the engagement of citizen scientists, and the critical engagement with digital processes underscore the evolving nature of historical research in the digital age, especially with the development of large language models fine-tuned to these specific issues.