Digital history & historiography

Small-Scale Testing on Generative AI and Post-OCR Correction in Historical Datasets

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets.

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets. Three chatbots, ChatGPT-4, Google Bard and YouChat and excerpts from 18th century French texts were utilised. The evaluation included qualitative and quantitative methods. Character and word error rates (CER, WER) were computed both by the agents and independently using a specialised Python library, and gold standard excerpts from the ICDAR 2017 competition on post-OCR text correction.

Show this publication on our institutional repository (orbi.lu).