Archivists Turn to LLMs to Decipher Handwriting at Scale

Overview

Archivists are increasingly using general-purpose AI models to transcribe handwritten documents at scale, a task that once required paleography training, custom software or weeks of work. The author describes feeding bell hooks' journal pages to ChatGPT at a Berea College archive, then profiles researchers who have tested LLMs on archival handwriting. In one study, large language models outperformed the specialized software Transkribus on accuracy, speed and cost, turning previously hidden collections into searchable records.

Key Takeaways

General-purpose AI models are now good enough to transcribe archival handwriting at scale.
The author used ChatGPT to read bell hooks' dense cursive journals at Berea College in Kentucky.
Historian Mark Humphries digitized 10 million pages of World War I pension records in Canada.
In a 2025 study, LLMs outperformed the specialized software Transkribus on accuracy, speed and cost.
Transkribus has announced it is integrating LLMs into its own platform.

Stats & Key Facts

#Humphries digitized 10 million pages of World War I pension records
#Tested on a corpus of 50 English-language letters, legal records and diary entries from the 18th and 19th centuries
#Transkribus character error rates of around 8 percent on untrained documents
#Humphries' best LLM approach pushed error below 2 percent
#LLM approach completed work 50 times as fast and at roughly 1/50th the cost
#Transkribus is used by more than 150 major universities and archives

Archivists Turn to LLMs to Decipher Handwriting at Scale

The Problem of Reading Handwriting

Machine handwriting recognition is a long-standing challenge.

›Researchers in the 1960s predicted machines would soon read handwritten text.
›The problem instead spawned decades of specialized research and commercial industries.
›Yann LeCun published landmark work on handwritten digit recognition in the 1980s.

LeCun, who later won the Turing Award for his contributions to deep learning, showed what was possible in narrow, controlled settings. Real archives, with their variation across many writers, were another matter.

A Personal Starting Point

The author hit the problem firsthand.

›She sat down with bell hooks' personal journals at an archive at Berea College in Kentucky.
›The handwriting was dense cursive that looked identical to her eye.
›She photographed pages and fed them to ChatGPT to read them.

The author expected an intimate peek into private thoughts but got frustration instead. Her tool of choice worked well, and she notes she was not the first person in an archive to figure this out.

Scaling Up With Mark Humphries

A historian tested LLMs against a large corpus.

›Humphries digitized 10 million pages of World War I pension records in Canada.
›The records had no index and no standardization.
›They were written by hundreds of different clerks, officers and administrators.

Humphries is a professor of history and coordinator of the applied generative AI program at Wilfrid Laurier University in Waterloo, Ontario. The mix of many writers ruled out the standard workaround of training a specialized model to recognize one person's handwriting. When GPT-4 came out in 2023, he started feeding it handwriting and found the results rough but better than any general tool he had tried.

The Study Results

LLMs beat specialized software on key measures.

›The team tested 50 English-language letters, legal records and diary entries from the 18th and 19th centuries.
›LLMs outperformed Transkribus on accuracy, speed and cost.
›Transkribus is used by more than 150 major universities and archives.

The results were published in May 2025 in Historical Methods. On documents it had not been trained on, Transkribus had character error rates of around 8 percent, while Humphries' best LLM-based approach pushed that below 2 percent, completing the work 50 times as fast and at roughly 1/50th the cost. Transkribus has announced it is integrating LLMs directly into its own platform.

Why General Models Win

Humphries offers a theory.

›AI researcher Richard Sutton argued in 2019 that general methods using computation eventually outperform specialized ones.
›Humphries thinks that is what is happening with handwriting.
›General models absorbed the link between documents and transcriptions from vast training data.

The general models were trained on such a wide range of data that, somewhere in that pile, they absorbed the relationship between handwritten documents and their transcriptions without anyone explicitly teaching them.

What It Means for Archives

The practical effects are already showing.

›Pages that once required paleography training can now produce usable transcriptions in seconds.
›Collections that were preserved but functionally hidden are becoming searchable.
›Scholars and families can ask questions they rarely had the time or money to ask before.

The article cites Lianne Leddy, an associate professor of history and the Canada Research Chair in Indigenous studies, as part of the unfolding practical consequences in archival work.

Frequently Asked Questions

Why are archivists turning to LLMs?

General-purpose AI models are now good enough to transcribe handwritten archival documents at scale, work that once required paleography training, custom software or weeks of effort.

How well did LLMs perform versus Transkribus?

On untrained documents, Transkribus had character error rates of around 8 percent, while the best LLM approach pushed error below 2 percent, ran 50 times as fast and cost roughly 1/50th as much.

Who is Mark Humphries?

He is a professor of history and coordinator of the applied generative AI program at Wilfrid Laurier University who digitized 10 million pages of World War I pension records in Canada.

When was the study published?

The results were published in May 2025 in the journal Historical Methods, based on a corpus of 50 documents from the 18th and 19th centuries.

Is Transkribus adopting LLMs?

Yes. Transkribus has announced it is integrating LLMs directly into its own platform.

General-purpose AI is making long-hidden handwritten archives searchable faster and more cheaply than specialized tools.