Pitfalls in presenting Census data
With the array of data visualisation tools available these days, it may seem pretty easy to present Census data for small areas and draw conclusions from it. Since the release of data on June 21st, quite a few websites have popped up offering data for your area and analysis. Unfortunately it’s never quite as simple as it seems, and there are many pitfalls in presenting this data.
On July 12th, this article appeared on news.com.au, and for some reason was promoted by the ABS media team on facebook and twitter. On the surface it looks great – There is a little on-page app to compare the characteristics of your suburb in the 2006 and 2011 Census to see how it’s changed. Unfortunately it falls foul of several of the pitfalls of looking at Census data, so it’s worth looking at in a bit more detail.
Postcodes change over time
It seems everyone wants to view data on a postcode basis. Unfortunately postcodes are quite unsuitable for displaying statistics, because they often don’t have well defined boundaries and may relate to PO Boxes or delivery routes. The ABS puts out a set of boundaries which are a best match to the postcodes that exist at the time, and calls these “Postal Areas”. Because these postal areas are an aggregate of SA1 districts in 2011 and Collection Districts in previous Censuses, the boundaries of them can vary quite a bit from the actual postcode boundary (where the postcode has a boundary).
This isn’t too bad if you’re just looking at the data for one Census year, but the problems mount if you try to look at change over time. Not only can the postcode boundary itself have changed, but the small units which form the “best fit” can change as well. Take the example of postcode 3778 in rural Victoria.
As you can see they are completely different, leading to different populations and characteristics.
Unfortunately in the little app, there is no mention of any of this, and you can happily put in 3778 and get a comparison over time, completely unaware that you’re looking at different boundaries.
At .id, our profile.id boundaries are worked out in conjunction with every council client to form the best match to logical communities of interest, which are consistent over time, and we rarely if ever use postcodes.
Suburbs are not postcodes
Although many people use them interchangably, suburbs are not the same thing as postcodes.
The ABS actually have data available from the Census for postal areas, described above, and “State Suburbs” which are a best fit of SA1s to official suburb/locality boundaries. Unfortunately again, for rural areas in particular, they are quite different to the actual locality boundaries, and can change substantially between Censuses, so they should not be used for time series.
Oddly enough, the news.com.au app doesn’t even use suburbs, though the ABS have them available. You enter a postcode and then get to choose a suburb within it, BUT NO MATTER WHICH SUBURB YOU CHOOSE YOU GET THE SAME POSTAL AREA DATA!
Try it! Postcode 3199 contains 3 suburbs (Frankston, Karingal and Frankston South) – although these areas are very different they are contained within the same postcode and the app will give you the same data for them all.
At .id, we do use suburb and locality boundaries (sometimes splitting up large suburbs further at the request of a council). We split Census collection districts and SA1s to exactly match the boundary, rather than using a best fit, and most importantly, we do this consistently over time. When the new version of profile.id is online it will contain data from 1991, 1996, 2001, 2006 and 2011 Censuses for each suburb in an LGA based on boundaries that do not change, so you can be sure you’re always comparing the same area.
Incomes and rents are subject to inflation
The Census measures incomes, rents and mortgages, and calculates the median (middle point) for each area. This is great for comparing between areas (though as a single measure of average, the median is subject to some issues too), but not so good over time. Over time everything tends to go up due to inflation, so comparison of actual dollar values over time isn’t that useful. To be fair, the news.com.au app does give a national average, so if you do a bit of a calculation you can work out if your area has increased more or less than the national average, but it’s still not easy and you can’t compared the dollar values directly.
At .id, we use a technique called quartiles to benchmark the change in dollars to change in a benchmark area. For more information see our blog post about quartiles.
Small populations can change very easily
Possibly the worst pitfall the article falls into, is in the analysis. The headline and picture in the article are of a place called Tankerton, which is claimed to be the most “Godless” place in Australia, because it has the highest proportion of people with no religion. This is incredibly misleading for a few reasons:
- The data relates to the whole of French Island, not just Tankerton
- The whole of French Island is tiny anyway.
- The entire population of French Island in the 2011 Census was 116 people, and the 55% figure was the result of 64 of them marking “No Religion”. The actual population is not shown anywhere in the app.
It’s very easy for very small populations to be “outliers” in statistical data. The larger the area, the more smoothed out differences become, and the more confidence you can have in saying this is an important trend. Small populations can change rapidly over time. It would only take a few non-religious families to move from French Island to bring the percentages back close to the national average. And it’s not hard to find small populations within larger areas which have higher percentages than that. In fact there are over 50 SA1 areas in Australia with larger populations than that which have a higher percentage of “No Religion” – but they don’t equate to arbitrary whole postal areas so they get missed out from this analysis.
Not only Tankerton, but 3 of the 4 other areas listed as having the most “godless” people have miniscule populations. Salisbury South as a suburb doesn’t actually have any population (it’s industrial) but its postcode 5106 is shared with Parafield Airport which had 77 young men (the most likely group to be non-religious) staying in a pilot training school. Gascoyne River had 251 people and Sherbrooke 286. Fitzroy North/Clifton Hill (3068) is the only area listed with a reasonable size population, of 17,000.
So there are significant pitfalls associated with analysing very small populations – they can change rapidly over time in percentage terms, and are more likely to have percentages at the extreme ends of a range, but have little impact on the wider data. They are also much more prone to people making mistakes on the Census form or putting in deliberately false information. And of course if the geographic boundary has changed over time (as detailed above), it can completely throw out any time series analysis.
At .id, our small areas on profile.id which allow comparison across 5 Census time periods need to meet a minimum size criterion to produce meaningful data. This is usually around 1,000 people depending on the area’s characteristics, and ensures that users are getting meaningful information about reasonable size areas on which to base decisions.
All these represent some of the reasons that it takes us a while to put the ABS data into our sites, after the data comes out from the ABS. It’s easy enough to throw some numbers into an app and purport to show change over time, but with our sites you can be sure you are comparing useful and logical geographic areas which have been consistently defined over a long time period, and our analysis in blog articles and at our training generally excludes very small areas or combines them into useful output units. It’s all about making this data accessible and useful in making evidence-based decisions.
Access more information about the Australian Census 2011.