API via OAI-PMH

This recipe will step through some python code to create a OAI-PMH harvester to interrogate an OAI-PMH API endpoint and manipulate the results. This example takes around 10 minutes to complete.

The Prosecution Project API is an open api powered by OAI-PMH.  

There are 2 metadata schemas provided by this API:

  • Dublin Core metadata schema (oai_dc)
  • Local custom metadata schema (pp)

Below is the list of available requests defined by the OAI-PMH protocol:

  • List all records in OAI_DC format (paginated):

verb=ListRecords&metadataPrefix=oai_dc

  • List all QLDSC records in OAI_DC format (paginated):

verb=ListRecords&metadataPrefix=oai_dc&set=QLDSC

  • List all OAI sets:

verb=ListSets

  • Request specific record in OAI_DC format:

verb=GetRecord&identifier=oai:prosecutionproject.griffith.edu.au:QLDSC/3615&metadataPrefix=oai_dc

See the Tinker Workbench page on reference datasets for a full description of the API. The below recipe uses the oai_dc schema as an example.

Ingredients

Datasets

Prosecution Project API reference dataset.

Sample code from github

Tools

Python Jupyter Notebook

Techniques

  1. Create a Jupyter Notebook on the Tinker Workbench
  2. pyoai provides a nice python API for OAI-PMH (http://infrae.com/download/OAI/pyoai)
    1. Import the Client, MetadataRegistry and oai_dc_reader libraries
  3. Set the API endpoint variable
    1. URL = ‘https://oai.prosecutionproject-test.griffith.edu.au/oai
  4. Define an array for the resulting data
    1. Records = []
  5. Configure the “harvester”
    1. registry = MetadataRegistry()
      registry.registerReader(‘oai_dc’, oai_dc_reader)
      client = Client(URL, registry, force_http_get=True)
  6. Fetch the records from the API  –
    1. for record in client.listRecords(metadataPrefix=’oai_dc’):
         records.append(record[1].getMap())

A worked example can be found here:

(new link for the jupyter notebook)

https://drive.google.com/drive/folders/12DYICRlc512TR5d71GKp_iTTKbaTgYvx?usp=sharing

Next Steps

Geocoding data

Visualisation of data