Handwriting to Visualisation – From the Page

Recipe – Diary to Map.

From transcription of handwritten text, to being able to visually represent data on a map using the Diaries of an 19th century Victorian prison governor John Buckley Castieau (1855 – 1884).  This work package will expand on the following drafted Recipe answering the questions listed below in a way such that future users of the systems can follow a similar pathway.

In this example we will be using From the Page to transcribe the images, push the resulting files through a Python Jupyter Notebook using the Spacy Module to extract named entities and the Python Geocoder module to convert the named entities into Latitude and Longitude which can then be visualised.

Pre-Reading:

Jupyter Notebooks

Ingredients:

  1. Images of handwritten artifacts.
  2. Transcription Package – From The Page
  3. Tinker workbench + Python Jupyter Notebook
    • Code from: …
  4. Data visualiser

Steps

1. Transcription of images.

  1. Create a From the Page project by emailing the Uni Melbourne SCIP team
  2. Upload images (see From The Page FAQ for more info on how to get started)
  3. Transcribe / get Community to transcribe (See FAQ for more info)
  4. Export Xhtml  or TEI transcription
  5. Choose appropriate format (xhtml or tei)
  6. Download file

 2. Named Entity Recognition

  1. Login to the Tinker Workbench
  2. Clone the “test git repo” for this recipe
  3. Launch a Python Notebook
  4. Step through the “recipe” for NER.
    1. Install required Python modules
      1. spacy – for NER
      2. geocoder – for translating entities into latitude / longitude
      3. pandas – data analysis tools
      4. numpy – scientific computing tools
      5. matplotlib – graphing / plotting tools
    2. Convert extracted file from XHTML to text
    3. Read in text file
    4. Extract place names and context to assist with manual confirmations.
    5. Manually confirm placenames

 3. Geo-Coding

In this recipe we are using the Python Geocoder module to map the extracted Named Entities to latitude / longitude.(continue to run through the code from step 2)

  1.  Loop through manually confirmed placenames with the geocoder module and assign Latitude / Longitude

 4. Visualisation

Next Steps

  1. Where does the data live?
  2. Is it published anywhere?
  3. Repository?
  4. Update / Create a new Recipe?