Better use of data - Connecting addresses in datasets
Addresses in datasets are a complicated thing. For example, they may have typos and abbreviations, they might rely on the context of the dataset (is this Melbourne Australia, or Melbourne Florida), and they may be more or less specific. Given an address or dataset containing addresses, how can we discover useful connections with other datasets given a large corpus of potential information?
During COVID there was a need to coordinate the deployment of inspections across business throughout Victoria. There weren’t enough resources to inspect everything, but we wanted to have an idea of the fraction of businesses we were covering. We received datasets from business registrars, work safety authorities, and enforcement agencies and had to piece together their inconsistent schemas to form a full picture.
Attempts were made to use ML to parse the addresses, but the final solution ended up mostly being a collection of regular expressions.
We had difficulty with the following:
• Units or non-detached residence naming conventions e.g. Unit 1, 1/43
• Road names and its various abbreviations and suffixes
• Hyphenation and forward slashes
• Shopping centre, University Campus and Defence Barracks addressing
• Suburb names that included street abbreviations in it name, like St Kilda
To give a particularly curly example, the location of Castlemaine Primary School is not a numbered street address, but a “corner of” address:
Castlemaine Primary School
Cnr Mostyn & Urquhart Street
Castlemaine VIC 3450
This problem was the genesis of this challenge; addresses were originally created for human-to-human communication within a given context, not for data science. How can we solve addresses and more easily connect our datasets?
• Geoscape Geocoded National Address File (G-NAF) - https://data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details
• Plus Codes - https://maps.google.com/pluscodes/
• Australia Post: address standards - https://auspost.com.au/content/dam/auspost_corp/media/documents/Appendix-01.pdf and https://auspost.com.au/content/dam/auspost_corp/media/documents/australia-post-addressing-standards-1999.pdf
Eligibility: Participants must use one or more datasets from Data.Vic to be eligible.
Entry: Challenge entry is available to all teams in Australia.
Australia Post – Address standards
Google Plus Codes
Geoscape Geocoded National Address File (G-NAF)
Vicmap Features of Interest
Vicmap Reference - Address Source Table
Vicmap Address dataset
Victorian Government School Zones 2023
Traffic Lights data - Victoria
Victorian fatal and injury crash data
Victorian liquor licences by location
Traffic Count Locations - Victoria Roads
School Locations 2022 - Victoria
Building Permit Activity Data 2022 - The Victorian Building Authority