Data quality in the 2018 census: Comparability problems
The 2018 Census has been unique in many ways. Issues with under-counts resulted in the deployment of new methods to source and validate data. One of the core approaches was the use of administrative records to supplement the collected data. In many cases, this has resulted in a more complete picture than previous censuses have provided. The better coverage, in turn, creates its own issues when comparing results to previous years. Penny, our NZ expert, dives in.
Want to be kept up-to-date as 2018 Census data is released across .id’s Community Profile and Social Atlas tools? Sign up here.
As the 2018 Census data is released in juicy bite-sized chunks, the extraordinary work Stats NZ have undertaken to remedy/overcome the problems of poor census response rates become apparent. Each dataset has its own quality rating, and for many there is a full breakdown of where the data has been gathered from.
An example is the Ethnicity dataset. It has received a HIGH grade which means that the data is regarded as being 99% accurate.
Good news! The data ratings are rock solid
The data ratings given for each data set are rock solid – carefully vetted in processes much more exacting than in prior years. For transparency, Stats NZ go into great detail to describe all data sources. This approach plays out in the following table which denotes sources for age data.
Table 1. 2018 Age dataset – sources
|2018 age – Census night population|
|Response from 2018 Census||84.7 percent|
|Response from 2018 partial forms||4.1 percent|
|2013 Census data||0.0 percent|
|Administrative data||10.9 percent|
|Statistical imputation||0.3 percent|
|No information||0.0 percent|
What are the effects of the increased use of administrative data?
The new administrative data source delivers “real data about real people” rather than imputation modelling (see our Census handbook, Worth the wait: a simple guide to navigating the 2018 Census, for a full explanation of administrative and imputed sources) or relying on a ‘Not Stated’ category as in previous censuses. Using administrative data sources immediately brings up a surprising problem of data comparability between censuses … but only in some datasets.
In the Age dataset, “administrative data” sources make up 10.9% of the data in the 2018 figures. The data has been triple checked and has a high rating. In prior censuses, if an answer to the Age question was not provided, Statistics New Zealand would have imputed the age of the respondent. There was no “Not stated” category for this variable and, consequently, no comparability issue.
However, in other data sets – such as Ethnicity – a ‘Not Stated’ category was used in past censuses to cover ‘Don’t know’ responses as well as the non-responses ‘Not Stated’ and ‘Unidentifiable’.
In the 2018 Census there is no ‘Not Stated’ category because of the use of a variety of sources to account for missing data in the Census (see Table 2) . This is where comparability – or the use of time series data – needs to be treated with great caution. An apparent increase in a category may be simply due to data being available for the full population in 2018.
Table 2. 2018 Ethnicity dataset – sources
|Ethnicity – 2018 census night population|
|Response from 2018 Census||84.4 percent|
|2013 Census data||8.2 percent|
|Administrative data||6.2 percent|
|Statistical imputation||1.2 percent|
|No information||<0.1 percent|
Due to rounding, individual figures may not always sum to the stated total(s)
Which topics are affected?
Of the datasets released to date, this problem with time series comparisons is particularly apparent in ethnicity, birthplace, Maori descent, dwelling and dwelling occupancy information.
To help you work with this data, our Community Profile tool includes data ratings and usage guidance on all Census 2018 data.