DigiVol


The Atlas of Living Australia, in collaboration with the Australian Museum, developed DigiVol to harness the power of online volunteers to digitise biodiversity data that is locked up in biodiversity collections, field notebooks and survey sheets.

Getting Started

  1. Create a Digivol project by emailing the ALA team
  2. Support team will be able to provide user guide information for administrators.
  3. Transcribe / get Community to transcribe
  4. Extract files by going to the Admin page for the Expedition and selecting an appropriate export format.

Digivol as a Transcription Tool

Organisations that work with natural history collections around the world, including iDigBio (https://www.idigbio.org/), Notes from Nature (http://www.notesfromnature.org/#/) and Atlas of Living Australia (https://digivol.ala.org.au/) have developed online portals to facilitate the transcription of structured data from specimen labels into pre-determined fields. This structured online or volunteer transcription has also been explored through Zooniverse, to help astronomers, historians and others with large collections of digital images that they want to turn into data. As well as specimen labels, people have used it to transcribe field notes, ancient manuscripts, historically significant records and other useful material.

The Atlas of Living Australia (ALA), in conjunction with the Australian Museum, launched the DigiVol volunteer transcription portal in 2011. It was initially built specifically for transcribing specimen labels and is used for this purpose by the Smithsonian, the US National Herbarium, the South African National Biodiversity Institute, the Vermont Center for Ecostudies, the Australian National Insect Collection and the University of Melbourne Herbarium. Digitising and organising data about specimens and sightings, including date and location of collection, identification and other details, greatly adds to their scientific value and has the potential to make biological research more efficient and effective. DigiVol templates are designed to collect this data.

Digivol for Unstructured Text Transcription

Although initially developed for specimen labels only, DigiVol is a flexible platform that can be used to facilitate transcription of many types of material, including collection registers, survey sheets and field diaries. The platform provides flexible transcription templates, that can be adapted for purpose by the project designer.

Digivol structure

The Digivol platform requires the project administrator to create a transcription template, load images or scans of the material to be transcribed, create any guidance or instructions that they think will be useful, and frame the project with an appealing image and description to attract volunteers to the task.

The transcription template is very flexible, but as it has its roots in transcription of biological specimen labels, fields are largely organised around the Darwin Core metadata schema (the accepted standard for sharing biological collections). Additional fields are available for transcribers to share comments to the project manager and the like. However, the Darwin Core metadata structure is only visible at the ‘back end’ and the project manager can label those fields in any way they like. This means that one could create a structured data template for almost any kind of data, and map your preferred field labels to the Darwin Core back end. In designing the project you can add or remove most fields, making the template as complex or as simple as you like. In the case of structured data, if you have a consistent field throughout all records (e.g. institution name if all records are from one institution), you can pre-fill those fields at the time of loading the expedition. You can also create pick lists if they are relevant to your project. These two functions save time in transcription and validation and makes the transcription more engaging for the volunteer as they aren’t unnecessarily transcribing exactly the same thing into every task.

Digivol also makes templates available for unstructured text transcription, such as field notes or journals. These templates are largely a simple text box and a comments box for comments from the transcriber to the project manager. After transcription, the data can be exported as a Comma Separated Values (or .csv) file. This works well for structured data in fields, but cannot collect or retain any formatting for text data (unless you can train your transcribers to use markdown tags).

Establishing Transcription Standards

There is no correct method of transcribing; how a document is transcribed depends on the intended audience and purpose of the transcription. For instance, the aim of the New York Public Library’s What’s on the menu? is to create a “database of dishes” (http://menus.nypl.org/). Volunteers are instructed only to transcribe things you can eat, drink or smoke, to ignore all other text.

For online transcription projects, where face-to-face training is not an option, instructions usually take the form of written guidelines or tutorials. As transcription projects vary enormously in their content, format and style, a specific transcription tutorial is usually required to for every project in Digivol. These tutorials must be provided by project administrator while creating a new project (called expedition in Digivol terminology). The project administrator can also create ‘help’ buttons as they design their template

Building an Online Community

Digivol has a very strong community; currently there are 3179 active volunteers. When the content is first put up on a Friday, 30% is transcribed by Monday. These volunteers can contact each other directly or start online forums about particular topics. Administrators receive notifications of forum posts and are able to view the transcribed pages online.

Some volunteers specialise in certain kinds of transcription, based on their interest. For example, they might be a marine life enthusiast, so they spend their volunteer time transcribing only specimen records of marine invertebrates or other undersea life. They benefit by learning more about their area of interest and getting the satisfaction of contributing to their preferred field of scientific research. Your transcription expedition will be done more quickly and more effectively if you can promote it to a team of volunteers that is interested in the project. This might be enthusiast group or fellow researchers and students from the relevant discipline, or it might be your friends, family or other supporters. Promoting your expedition by social media can be helpful on this front.

Digivol volunteers must create an account and enter an email address when they sign up. This might deter some potential volunteers from contributing, but for others the communication between volunteers and project administrators and the thriving online community is an attractant in itself.

Validation

DigiVol gives you the option to validate your data. Most expeditions on DigiVol use this two step process, so each transcription is validated by a validator who checks the transcriptions and ensures that the data has been correctly copied into the template fields. In many cases the validator is an experienced volunteer transcriber, or in the case of institutions, an in house volunteer who reviews the transcriptions. Validations are done one by one, and is a process of checking each transcription.

Depending on the nature of your data or the required degree of accuracy, you might decide to forgo the validation process. If your structured data is very consistent, and anomalies would stand out once all the digital data was available, you may choose to review your data using a data cleaning tool such as Open Refine. An example of this would be if the only information you were transcribing were dates within a certain range, and states of Australia, you could export all data upon completion and use a data cleaning tool to find outliers.

Alternatively, you might find that a mostly accurate transcription will be sufficient in the first case. This might be the case for long personal journals and the like. The State Library of NSW’s Amplify program automatically transcribes audio records. They find that a first transcription results in 80% or more accurate transcriptions, which is sufficient for discoverability purposes – their goal through transcription. This is where it can be helpful to be clear about your transcription objectives.

Recognising and Rewarding Volunteers

The DigiVol administration, based at the Australian Museum, has found it helpful to recognise and reward the hard work of its volunteers. This encourages longevity of involvement, and a greater number of transcriptions from each volunteer. The platform includes some built in methods of recognition, including site-wide and individual project leader boards, for the volunteers that have transcribed the most tasks. In some cases project administrators have publicly acknowledged their volunteers, or shared reports on how their effort is contributing to research by group email. Project administrators are able to email contributing volunteers individually or as a group at any stage during and after the project.