Getting started with Text (and Voyant)


(Note: this recipe is a draft being finalised)

So you’ve got a slab of text and you want to play around with it.  This recipe will help you get started using Voyant.  You could also start playing with any text from Project Gutenberg or in the Hathi Trust Digital Library.  It’s a good idea to work to work with something that you are familiar with, say Charles Dickens, to better understand Voyant functionality.

Preparation

  • Adopt an attitude of playful exploration – not be scared by the technical components of the set of tools
  • Franco Moretti book “Distant Reading” – is worth reading along with the critiques, eg. Katherine Bode from ANU
  • Stefan Sinclair’s Voyant README on github

Collecting data

  • Work in datasets – if you have a mass of literary data, compile together
  • Cleaning can be important – (advice pending, tools can sometimes help)
  • Use flexible formats – ones that allow you to move between different formats.
  • Establish good record keeping around your data – a way to keep track of what you’re doing, including any changes from cleaning, format changes etc. You could look to github for doing record keeping eg. http://swcarpentry.github.io/git-novice/

Getting data into the tool

https://voyant.https://tinker.edu.au/

For small data or single texts copy and paste straight into the text box.  For larger datasets use the Upload function.

You can segment the text into different files in order to represent that data in different ways or compare different documents.  Different documents might be different time periods for example.

Check for any error messages as some large texts can have problems, and you may only be working on part of the corpus.  Check that the results and size look right using the Summary tool bottom left and the Reader top and centre.

What to do first?

Voyant presents a dashboard that offers a window on a number of tools at once, with the text (usually) right at the centre.  You can be working on one tool, but with other tools evident at the same time.  You can bring different tools together into the same dashboard, and this can lead to more sophisticated techniques.

Inspect the Cirrus word cloud generated on the top left, look for prominent words.  It can give fast results, illuminating something unexpected, which might point the direction for further investigation.  It takes very little time to explore and is a good example of a simple Voyant tool.  However, be aware of the criticisms of using word clouds (reference pending).

Each tool has different buttons if you move the mouse over the top right of the tool window, including Help, Options, Choose another tool, and Export.  The dashboard has different buttons at the top panel right which allows you to add in other tools.  Try learning about Stop Words using the Help and Options button on the Cirrus word cloud tool.

Next investigate the “Trends” tool (top right), “Summary” tool (bottom left), and the ”Contexts” tool (bottom right).  Try clicking on the most frequent words in the Summary tool, then click on a word in the Reader, and then in the word cloud.  Watch how the tools change context.

Exploring further

First familiarise yourself with the basics and add other tools into the dashboard.  Tool help and the Voyant Documentation provide information on all the available tools.  Go to the literature and do a scholarly database search for “Voyant” to find articles on research that used the Voyant tools.  The developers have also contributed chapters to some books.

Saving results

Tools can allow you to Export views (interactive links), visualisations and data, so explore this feature.  Bottom left “Documents” tool allows you to Download the data, but it’s better to establish good record keeping around your data, for example using github to keep a track of the corpus.

You can also capture and save your findings by taking screenshots and journaling notes on what you did.  Outline the tools you used, decisions and your approach towards your research/method being repeatable.

Publishing

When you use a digital tool you need to reference it, so make sure you reference Voyant.  And talk about it when you describe your method.

      Sinclair, Stéfan and Geoffrey Rockwell, 2016. Voyant Tools. Web. http://voyant-tools.org/

Your corpus might also be stored on github or an equivalent, or you could work with your University library to investigate how to make the data available.

Later

It’s crucial to know there are levels beyond Voyant and you may eventually need to dedicate time to get to these.  There are some good communities, literature and primers that you should look out for.