Preparing for the Census 2021 data release
It’s no small task to process the huge amount of Census data into local insights. Continuing the story of our Census 2021 preparation, Glenn shares some behind the scenes tales of how .id approaches this complex job.
The 2021 Census is expected to show some significant changes in the way we live in Australia, as well as things we’ve been talking about for a while, like a slowdown in population growth. The results should paint a unique picture of Australia at August 2021, living in a pandemic, with lockdowns across large parts of the nation. Before we can get that though, ABS has to process the data, and we have to do a lot of background work for our clients.
Getting Census data from the ABS
One important thing to remember is that .id doesn’t get the Census data earlier than anyone else. Although we work closely with the ABS, we are an external organisation and we can only get the data when it’s released to the public. The exact release date is not set as yet, but we’ve been told first release is in June 2022. It will likely be available form 11:30am on the day of release, when the ABS will lift the embargo on the raw dataset. We have our data order in (more on that in another blog) and should receive it on release day.
I’ve personally collected our order from the ABS office in Melbourne the last two Censuses. In 2016 the Dropbox we were supposed to use wasn’t working, so I ended up cycling over to the ABS office to pick up the entire Census results on CD-ROM!
Census 2021 data release timeline
The June release date represents about a 10-month processing time from Census day. This has been roughly the same for the past 4 Censuses, back to 2001. Prior to that the data could take 18 months to 2 years to be released. We haven’t seen any recent reductions in processing time, which you might have expected given that around 80% of households complete the Census online.
About 75% of Census data items are released in June. All your basic demographic info is in there: age/sex, birthplace, language, religion, education, the new health topic, incomes, citizenship etc. The second release is a bit later – expected around October this year. Second release includes topics such as employment, occupation and industry datasets, detailed field of qualification data, and population movement over 1 and 5 years. These topics can’t be automatically read by computer as many rely on written responses (eg describing main duties at work) so need some manual interpretation to code.
A third release will happen in early to mid-2023, covering topics requiring additional processing, such as distance to work, socio-economic indexes for areas (SEIFA).
Adding new Census data to .id information tools
We can’t get the data in on the day of release, but we’re working on our processes to have it in the community profile as soon as possible after release date.
Every five years the release of Census data sends our technical and data processing teams into something of a flurry. The task can be simply described: copy the data from spreadsheets provided by the ABS into our internal database tables so they can be processed into insights on our online tools. Actually doing it – and making sure it’s right – is far less simple.
One of our newer software engineers recently asked, “Won’t we just use the same process you used last time?” The thing is, we’ve done it quite differently every Census. 5 years is a long time, and there are always new processes and technologies available. Our all-consuming goal is to process the data as quickly as possible without causing data issues, so our clients can start acting on the insights provided.
In the lead up to the Christmas break, our software and database team spent time trialing out different methods for processing the data. Last Census we built our own data-consumption software. This Census we’ll be focusing on using Python code to speed up our data processing. We’re know writing the required code and running “fire drills” to make sure our technical processes are set up and ready for release day. Each recent Census we’ve significantly shrunk the amount of time between data being released and it appearing on our tools; we’re hoping to see a similar reduction this Census as well.
In profile.id (the community profile tool) there will be a topic-by-topic update, likely starting with the basic age structure and total population data. So there will be a few months in which profile.id will be a hybrid of the 2021 and 2016 Censuses (of course we retain past Census data too), until after the second release. We’ll clearly display which topics have been updated to the new data. Topics such as SEIFA will have to wait until the the final release in 2023.
Our social atlas (atlas.id) and housing monitor (housing.id) tools will be updated in parallel with the community profile over the same timeframe. The profile.id Communities of Interest optional modules are the most complex dataset and will follow from second release. Census data in the economic profile (economy.id) is primarily second release, so won’t be updated until after October. However, economy.id’s core data source is the annual modelled data on jobs, industry and economic value, independent of the Census. The next update to that dataset is due soon, well ahead of the first Census release.
In the meantime we’re working in the background to make sure this update is as smooth as possible for all our clients.
A few of the things we’re working on at the moment:
- A consistent set of geographic areas we can order from the ABS which minimise the level of random adjustment that the ABS have to put in place to protect confidentiality . We’re finalising the last of the Census geography updates from our clients at the moment. Some new areas can’t be implemented until we get the 2021 Census data but we’re getting them sorted now so the new data can swing straight in. Further changes to geographic areas are still possible any time after this but won’t benefit from the immediate Census update.
- Reviewing the classifications for birthplace, language, ancestry and religion to ensure consistency over time, but that they also reflect emerging population groups and contemporary understanding of culture. Dan Corbett wrote an interesting blog on this process recently.
- Combing through the Census Dictionary for changes from the previous Census. This contains every category and classification available for the 2021 Census. An example we’ve found is in the Housing Tenure classification: the old category of “Rented” no longer includes people who occupy a property rent-free. These used to be counted as paying $0 rent but are now part of “Other tenure type”. We have to re-extract and change the old Census classifications back to 2001 to ensure the tenure and rent definitions are consistent across years. Without doing this work, users could be making decisions based on data that shows a change due to the way things are coded as opposed to changes in the real world.
- For similar reasons, we are removing 1991 and 1996 Census data from the community profile for many topics. The older data will still be available to clients who have subscribed to this for topics such as age structure and household type, allowing a longer term view of the suburb lifecycle. But there are too many changes in many topics to keep these comparable. The 2021 Census release of profile.id will include data from 5 Censuses (25 years of change) for most topics, with up to seven Censuses only for those basic demographics like age structure.
We’ll keep you updated on our processes over the next few months, ensuring our clients have an up-to-date demographic profile of their area as soon as the Census data are available.
Not sure if profile is available for your area? Check out our list here and let us know if you’d like to add it, if your LGA is not on the list!