Managing changes to 2016 Census data: interpreting perturbation and additivity
With recent changes to the way the ABS processed some of the 2016 Census data, we thought we’d give you an overview of the changes and things to consider so you can best analyse Census information. In short, changes to the data management method used to protect people’s privacy can impact the use of a few datasets which may have small results in a lot of categories, especially when used at a small geographic level or over time between 2016 and 2006/2011.
In the spirit of sharing, here’s a brief run down on one of the key changes to the 2016 Australian Census data. It’s a reasonably complicated issue, so I’ll do my best to translate it for you whilst not losing any of the integrity of the issue.
Just in case you are not as wrapped up in demographics as we are, the results from Australia’s recent Census were released a few weeks back. After getting our hot little hands on the new data, we noticed a few changes to the way the data has been processed which may impact the way you perform different types of demographic analysis.
What changed in the 2016 Census?
Every time the ABS conduct the Census, they make changes and improvements to elements such as the geography, categorisations, questions, response options, and their sequence order to reflect changes in society. For example, in the 2016 Census we talked about changes to the religion question.
One key change made by the ABS made changes to the methodology for processing 2016 Census data, namely the way in which data is randomised to protect the privacy of individuals. The ABS applies a method called ‘perturbation’ to protect an individual’s identity within a community so they cannot be directly identified. The way perturbation was applied to 2016 Census data is different from previous methods, which impacts the use of certain datasets, geography, and change over time analysis.
What is perturbation?
Aside from its usual definition of “anxiety or mental uneasiness”, perturbation is a data management method used to preserve privacy so that individuals cannot be directly identified and ‘differencing’ can’t be employed to derive an individual’s characteristics. Small adjustments are made to counts and totals, which result in small introduced random errors.
Perturbation can be applied with or without ‘additivity’ – whether the categories within the dataset add up to the total count.
The 2006 and 2011 Censuses used additivity, so that the random adjustments made were balanced out across a table and the totals were consistent across categories i.e. a random decrease in one category was matched by a random increase in another, to balance out the table. This meant aggregate totals for larger areas matched state totals.
In the 2016 Census, perturbation is used without additivity which means that categories may not always add up to the total. Where tables have a large number of categories containing small numbers, these small results are now suppressed to zero.
How do the changes impact your analysis of Census data?
As experts in analysing and interpreting Census data, the people at .id have invested significant time and effort to understand how these changes impact the way you analyse Census data. To mitigate the resulting differences in data, we have built in certain ways to manage and safeguard the use of Census information so that it remains simple, user friendly and as accurate as possible in our online tools.
Here’s a brief rundown to help you understand how the changes can influence the way you analyse Census data.
In short, without additivity you may notice that things don’t always add up – in tables where there are a large number of categories containing small numbers (which can be suppressed to zero), categories within the dataset may not add up to the total number anymore.
Tables created using large geographic areas will be close to the ‘true’ population however, when looking at derived totals made up from smaller areas, the results can be significantly less than the aggregate totals of larger areas.
As results can be suppressed to zero in small geographic areas, when you aggregate small geographies into a larger area, the result may be less than the total count for the larger area.
The effect can be significant when using datasets that have a large number of categories that have low numbers of responses, especially when the data is used at a small geography such as SA1. Specifically, the datasets most susceptible include:
- Language spoken at home – With over 500 languages, the likelihood of a small number of individuals identifying with a less common language is high.
- Country of birth – Approximately 300 different countries of birth are captured by the ABS.
- Indigenous status – Due to the low number of Indigenous persons in certain areas across Australia, data may be lost at SA1 level.
- Religion – There is a high change of minority groups being identified without perturbation.
Things to consider when analysing Australian Census data
Using small areas
For the limited datasets with lots of categories listed above, we recommend not aggregating data from SA1s but instead use the largest possible geography for the best results as larger areas are less likely to have suppressed figures for small counts. For datasets prone to small figures, SA2 should be the smallest geography used for aggregation. If using small areas to aggregate up into a larger geography, be aware the totals may not add up and will be less than the total count for the larger area.
Comparing change over time
As the method of perturbation has changed between 2006/2011 and 2016 Censuses, the changes also impact the way you accurately assess change over time between Census periods. When comparing data between 2011 and 2016 Censuses for datasets likely to have suppressed results, the data may display a false decrease due to the change in methodology.
Advice from ABS: In general, users should construct a table with the information they require so perturbation is applied to a count only once; it is not recommended to sum across perturbed counts to derive the information you require. No reliance should be placed on small cells as they are impacted by perturbation, respondent and processing errors.
Within .id’s information tools, we have implemented ways to manage these irregularities to make your analysis of Census information as pain-free and straightforward as possible. You can read detailed data notes in profile.id, atlas.id and .id Placemaker with the specific details. If you would like to talk in any more detail about these changes, please get in contact.
.id is a team of demographers, urban economists, spatial planners, population forecasters, and Census data experts who use a unique combination of online tools and consulting to help governments and organisations understand their local community.