Using Census data to understand changing community diversity
Daniel walks through our work reviewing categorisations for birthplace, ancestry, language and religion topics for our demographic information tools. This blog is part of our ongoing series sharing our work preparing for the Census 2021 data release.
Understanding how language and cultural diversity is changing within a community is essential for local government and others who service local areas. The Census collects information such as the language people speak at home and their proficiency in English, their birthplace and ancestry, and their religion. This provides essential data on diversity across Australia, right down to small local areas.
Balancing completeness against comparability
In a previous post I touched on the methods for making comparisons over time with changing geographies. A similar challenge exists for the data underlying our understanding of a community’s diversity, among other Census topics. For data such as birthplace, it is the source data available that can change (as geopolitical entities, new countries are created every now and then), creating a challenge we think about as comparability vs completeness.
Case study: South Sudan
South Sudan formally became independent from Sudan in 2011. For the 2011, 2016 and 2021 Census, respondents who would previously have had their birthplace as “Sudan” have been able to nominate “South Sudan”. If someone were to compare the raw Census data for 2011 and 2006 for people arriving in Australia from Sudan, it would show a potentially misleading drop off in numbers between those years – to understand the real change of people coming from that region of the world, you would need to compare Sudan from 2009 with Sudan and South Sudan in 2011.
When we rolled out the 2011 Census data in the profile.id demographic information tool, we made the decision to combine Sudan and South Sudan as “Sudan/South Sudan” in our birthplace, ancestry and overseas arrival data . In doing so, we erred on the side of comparability (understanding change over time) rather than completeness (reflecting South Sudan as independent data).
When the 2021 Census data is released in June 2022, there will be three Census periods (fifteen years) where South Sudan has been a valid response. As such, we will be separating Sudan and South Sudan on profile.id, allowing for more complete data while enabling meaningful time comparisons. (Data for Sudan will still be non-comparable to 2001 and 2006 Census data, with an apparent drop in numbers between 2006 and 2011, and we will include a data note to this effect. Our assumption is that the need for datasets on these quite different populations now outweighs the need for comparable older data.)
Categories need to meet minimum thresholds to be meaningful
Another change we monitor is the number of people in Australia who identify as being from a particular country, speaking a particular language, etc.
If there is not a minimum threshold of people in Australia who identify with a language, birthplace, ancestry or religion, we roll these up into larger categories. Atomising each group down to the smallest level can create “noise” that makes it harder to draw insights from the data, particularly for the sort of services provided by our partners in local government. Additionally, numbers can get so small that they are rendered meaningless, particularly as ABS data randomisation kicks in to prevent the release of confidential information.
In our latest review, we have identified several new countries and languages that have hit our thresholds to be included as distinct entries in Ancestry, Birthplace and Language topics. Along with South Sudan, new Ancestry options will include Kenyan, Congolese, Nigerian, Hazara and Sikh. In terms of languages, we are adding some more distinct entries, including Mongolian and splitting out Persian, Dari and Hazaraghi.
In some cases, we are able to backfill the data to previous Census years, maintaining comparability. In other cases, data are not available or not comparable for previous years. We will include a note that they are available from the 2021 dataset for the first time. This will make the “Other languages” (or “Other birthplaces” etc.) category also non-comparable for the previous years, as people who nominated these categories were previously lumped in here. We’ll include a note to this effect as well, but few people use the “other” category.
Reviewing birthplace, ancestry and language data will improve the Census data rollout
Most work we’re doing now is in some way related to preparing for the Census data release in June. Reviewing this data now means we can include our updated definitions in our ABS data order, the most reliable way of getting access to Census data immediately upon release. It’s also simply good practice to ensure profile.id reflects our latest thinking around how we can make Census data meaningful before the next set of that data comes through. There is always a trade-off between comparability over time and currency of meaningful categories. At .id we put a bit more emphasis on change over time than the ABS does, because it is such a vital component of telling the story of Census. But sometimes the need for more recent meaningful data outweighs that.
These changes will be factored into profile.id in the next few months ahead of the June release data for 2021 Census data.
If you have any questions about informing your decisions with Census data, reach out to us at firstname.lastname@example.org.