Digital history & historiography

Small-Scale Testing on Generative AI and Post-OCR Correction in Historical Datasets

written by

Florentina Armaselu

published on

31 May 2024

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets.

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets. Three chatbots, ChatGPT-4, Google Bard and YouChat and excerpts from 18th century French texts were utilised. The evaluation included qualitative and quantitative methods. Character and word error rates (CER, WER) were computed both by the agents and independently using a specialised Python library, and gold standard excerpts from the ICDAR 2017 competition on post-OCR text correction.

Show this publication on our institutional repository (orbi.lu).

Author(s)

Florentina Armaselu

Florentina is a Research Scientist

More about this author →

Small-Scale Testing on Generative AI and Post-OCR Correction in Historical Datasets

Author(s)

Tags

1 April 2025

Multilingual Word Embedding and Linguistic Linked Open Data for Tracing Semantic Change

31 March 2025

Hoxha, Enver

research areas

Public history

Contemporary history of Luxembourg

Contemporary history of Europe

Digital history & historiography

Small-Scale Testing on Generative AI and Post-OCR Correction in Historical Datasets

Author(s)

Tags

related content

1 April 2025

Multilingual Word Embedding and Linguistic Linked Open Data for Tracing Semantic Change

31 March 2025

Hoxha, Enver

research areas

Public history

Contemporary history of Luxembourg

Contemporary history of Europe

Digital history & historiography