Better use of data - Connecting addresses in datasets

Jurisdiction: Victoria

Addresses in datasets are a complicated thing. For example, they may have typos and abbreviations, they might rely on the context of the dataset (is this Melbourne Australia, or Melbourne Florida), and they may be more or less specific. Given an address or dataset containing addresses, how can we discover useful connections with other datasets given a large corpus of potential information?

During COVID there was a need to coordinate the deployment of inspections across business throughout Victoria. There weren’t enough resources to inspect everything, but we wanted to have an idea of the fraction of businesses we were covering. We received datasets from business registrars, work safety authorities, and enforcement agencies and had to piece together their inconsistent schemas to form a full picture.

Attempts were made to use ML to parse the addresses, but the final solution ended up mostly being a collection of regular expressions.

We had difficulty with the following:

• Units or non-detached residence naming conventions e.g. Unit 1, 1/43
• Road names and its various abbreviations and suffixes
• Hyphenation and forward slashes
• Shopping centre, University Campus and Defence Barracks addressing
• Suburb names that included street abbreviations in it name, like St Kilda

To give a particularly curly example, the location of Castlemaine Primary School is not a numbered street address, but a “corner of” address:

Castlemaine Primary School
Cnr Mostyn & Urquhart Street
Castlemaine VIC 3450

This problem was the genesis of this challenge; addresses were originally created for human-to-human communication within a given context, not for data science. How can we solve addresses and more easily connect our datasets?

Additional Information:

Relevant links:

• Geoscape Geocoded National Address File (G-NAF) -
• Plus Codes -
• Australia Post: address standards - and

Eligibility: Participants must use one or more datasets from Data.Vic to be eligible.

Entry: Challenge entry is available to all teams in Australia.

Dataset Highlight

Australia Post – Address standards

Go to Dataset

Google Plus Codes

Go to Dataset

Geoscape Geocoded National Address File (G-NAF)

Go to Dataset

Vicmap Features of Interest

Go to Dataset

Vicmap Reference - Address Source Table

Go to Dataset

Vicmap Address dataset

Go to Dataset

Victorian Government School Zones 2023

Go to Dataset

Traffic Lights data - Victoria

Go to Dataset

Victorian fatal and injury crash data

Go to Dataset

Victorian liquor licences by location

Go to Dataset

Traffic Count Locations - Victoria Roads

Go to Dataset

School Locations 2022 - Victoria

Go to Dataset

Building Permit Activity Data 2022 - The Victorian Building Authority

Go to Dataset

Challenge Entries

Back to Challenges