Georeferencing – Joining a spreadsheet with ABS boundaries

Learning curve: Intermediate

Introduction: This recipe describes the process to join a spreadsheet with its corresponding Australian Bureau of Statistics (ABS) boundary. This is a common spatial database procedure that relates text to geometry using what’s called a primary key. The primary key is one that is common to both datasets and may be numeric (e.g. area codes) or a string (e.g. gazetted place names).

ABS boundaries are enumeration (or aggregation) units describe geographic areas within which data has been captured such as counting. Each unit is assigned a standard area code and place name to facilitate relations. The extent of a unit is described using a geometric polygonal boundary, which is an important component in both spatial analysis and cartography.

Pre-reading is strongly recommended.

Pre-reading:
An introduction to statistical geography

Ingredients:

Datasets:
ABS Population Estimates by Local Government Area (ASGS 2017), 2007 to 2017

Local Government Areas ASGS Ed 2017 Digital Boundaries in ESRI Shapefile Format 

Tools:
Microsoft Excel

QGIS

Method:

  1. Download (and if necessary unzip) the two datasets to a data folder
  2. Start Excel and open ‘ABS Population Estimates by Local Government Area (ASGS 2017)’
  3. Carefully inspect the data, noting that it contains various worksheets, each describing a different state’s population estimates by local government area (LGA)
  4. Edit and flatten the file in the following way:
    1. Sometimes column headers may occupy more than one cell, making it difficult to use the data in various applications. To work around this, flatten each column header:
      1. Rename column headers e.g., ‘ERP_June30_2016pr’ so that each only occupies a single cell
    2. Copy each state’s worksheet into a single worksheet so contains all state’s data
      1. Ensure column names match
  • Finally save the edited file as a CSV, e.g., ‘ABS_PopEstimates_2017_edit.CSV’
  1. Open QGIS, click ‘Project’ on the menu bar and then ‘New…’ create a new project
  2. First add the boundary data. Follow the menu bar and click ‘Layer’, -> ‘Add Layer’ -> ‘Add Vector Layer’
    1. Clicking the button with the three dots, navigate to your data folder and select ‘LGA_2017_AUST.shp’, then click ‘Open’, then ‘Add’
    2. The new layer is added to your project and you should see the following: (1) a geometric representation of Australian LGAs in the main window, and (2) a layer called ‘LGA_2017_AUST’ within the ‘Layers’ window (in the lower left-hand corner of QGIS)
  3. Within the Layer window, right click ‘LGA_2017_AUST’, then choose ‘Open Attribute Table’
    1. A table will then be shown that describes data relating to Australian LGAs, note the values for ‘LGA_CODE17’ and ‘LGA_NAME17’ which uniquely describe the LGAs
  4. Second add the population estimate data. On the menu click ‘Layer’ -> ‘Add Layer’ -> ‘Add Delimited Text Layer’
    1. Navigate to your data folder and select your edited population estimates CSV file
    2. Under ‘File format’ select ‘CSV’
  • Under ‘Record and field options’, leave unchanged for now
  1. Under ‘Geometry definition’ select ‘No geometry (attribute table only)’
  2. Click ‘Add’ to add the layer
  3. The new layer is added to your project and you should see a new table added to the ‘Layers’ window
  4. Joining text to geometry. Within the Layers window, right click on ‘LGA_2017_AUST’ and select ‘..’
    1. From the left-hand menu select ‘Joins’
    2. Click the ‘+’ icon at the base of the Joins window
  • Select ‘ABS_PopEstimates_2017’ as the Join Layer
  1. To join these datasets there are two fields that are common to both: the LGA code (numeric data type) and the LGA name (string data type), and either may be used as a primary key:
    1. Numeric primary key
      1. For numeric data select ‘123’ ‘LGA Code’ (123 denotes a numeric data type) as the ‘Join field’
      2. Select ‘LGA_CODE17’ as the ‘Target field’
      3. The remaining fields offer additional ways to customise the join, leave these unchanged for now
      4. Click ‘OK’
    2. String primary key
      1. For string data, select ‘abc’ ‘Local Government Area’ (abc denotes a string data type)
      2. Select ‘LGA_NAME17’ as the ‘Target field’
      3. The remaining fields offer additional ways to customise the join, leave these unchanged for now
      4. Click ‘OK’ then apply the join
    3. Now confirm the results of joining by viewing the ‘LGA_2017_AUST’ attribute table. Right click on ‘LGA_2017_AUST. You should see the table now contains all fields from the population estimates spreadsheet.

Notes:

*When joining two datasets make sure that both relate to the same version of enumeration units. The ABS frequently update and release new versions of their units over time.  For example ABS 2016 boundaries differ to 2017 boundaries.  

Next steps:

Analyse

  • Results may be analysed using analytical software like R, MATLAB and Excel

Visit the spatial analysis recipe book for more information.

Map

  • Results may be mapped and presented using geographic information systems software like QGIS, ArcGIS and R. Visit the cartography recipe book for more information.