Small-Scale Testing on Generative AI and Post-OCR Correction in Historical Datasets

rédigé par

Florentina Armaselu

publié le

31 Mai 2024

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets.

This article proposes a small-scale investigation on the use of generative AI agents for post-OCR correction in historical datasets. Three chatbots, ChatGPT-4, Google Bard and YouChat and excerpts from 18th century French texts were utilised. The evaluation included qualitative and quantitative methods. Character and word error rates (CER, WER) were computed both by the agents and independently using a specialised Python library, and gold standard excerpts from the ICDAR 2017 competition on post-OCR text correction.

Afficher cette publication dans notre dépôt institutionnel (orbi.lu).