Transcription


The Government Gazette. Source: https://trove.nla.gov.au/gazette

Overview

Transcription tools used in digital humanities projects are almost as varied as the projects themselves. Most are more complex than a simple free text field for transcribers to type into. Some are excellent for fielded data, such as Digivol. Some facilitate volunteers with formatting tools to mark up their transcribed text. Transcribe Bentham uses TEI XML tags; WikiSource uses a special syntax called Wikitext or Wiki markup.

In order to use a computer to analyse documents, the text in those sources must first be converted to computer readable text. For some recent printed documents this conversion can be done via optical character recognition (OCR). For historical manuscripts or handwritten documents, however, OCR does not adequately detect the text on the page. For this reason, older documents and most handwritten manuscripts have to undergo a process of manual transcription.

Tools

Text transcription tools used in Digital Humanities projects are extremely varied. Some tools offer a simple free text field for transcribers to type into, while others are more complex and involve multiple fields and steps.

Some text transcription tools are excellent for fielded data; for instance Digivol. Others facilitate volunteers to transcribe by offering them formatting tools to mark up their transcribed text. Transcribe Bentham, for example, uses TEI XML tags, while WikiSource uses a special syntax called Wikitext or Wiki markup.

Some transcription tools facilitate annotations, allowing transcribers to make use of wiki-style syntax to tag keywords within the text, such as names of people, places or species. Examples of these include the Public Records Office Victoria (PROV) semantic wiki, WikiSource and FromThePage.

Many organisation around the world are using crowd-sourced tools to successfully transcribe handwritten material online. Some examples of transcription projects using online volunteers include:

The Tinkers team has conducted a useful evaluation of 3 different text transcription tools for use within the Tinker environment.

How do I get started?

Before beginning a Digital Humanities project of your own, it can be useful to follow a recipe.

In the same way that people learn to cook by using recipes (ingredients, utensils, steps etc.), researchers can learn how to use new digital tools and methods by following a series of steps.

Have a go one of these Text Transcription recipes:

Suggested readings