BLOG

Hotspots – Why statistical significance is important when identifying clusters

Hotspots – Why statistical significance is important when identifying clusters

Nenad Petrović 04 Aug, 2020

In the time of Covid-19, the idea of ‘clusters’ or ‘hotspots’ has become part of our daily conversation, but this identification of high-concentrations of people who share a given characteristic has always been a part of our work. As a demographic consultant, Nenad helps local government and community organisations understand the makeup of their communities, so they can target services where they are needed most.

In this piece, he explains why it’s important to go beyond this initial step of identifying a cluster of people on a map, to check that there is actually a statistically significant over-representation of people who share a demographic trait, so you can act to serve that group with confidence.


When we first started building community profiles for local government (nearly 20 years ago), as soon as we presented our first community profile, the next question quickly followed – ‘great – now can you also show us that on a map?’. Thus our social atlas tool (atlas.id) was born.

The ability to identify trends and patterns helps us understand a population and a place – knowing what it was, what it is and what it will be. To do this, we can use many analysis techniques and it is important to understand what the results are and are not telling you. Ultimately, you want to make conclusions with confidence.

What is hotspot analysis?

Take clustering and hotspots for example. When we look at a thematic map of a particular demographic or socioeconomic indicator, patterns of high and low concentrations quickly emerge – that’s the whole point and power of a thematic map.

But are those areas of higher occurrence an actually higher concentration of events compared to the expected number given a random distribution of events? Hotspot analysis answers this question. Hotspot analysis accompanies thematic mapping- a thematic map is a low-effort exercise with meaningful results, however, if you want to make any firm conclusions about there being a “cluster” or a “hotspot” of a certain demographic variable, you need to make sure that the hotspot is statistically significant.

Why is hotspot analysis useful?

Hotspot analysis is most commonly used in crime analysis, epidemiology, voting pattern analysis, retail analysis, economic geography and demographics. We use it alongside other more traditional spatial analysis methods to discover where statistically significant occurrences of a demographic measure exist and if it relates to what the thematic map is showing.

The ability to recognise statistically significant hotspots using GIS and spatial statistics toolsets means that if we are interested in understanding where higher concentrations of a particular phenomenon occur, we can choose a variable (such as disengaged youth in this example) from our Social Atlas tool, export data and use third party GIS software to run a spatial statistics analysis to understand whether the numbers we are seeing represent a statistically significant hotspot and whether we should focus our efforts in a particular area with confidence.

If you’d like to learn more about selecting data themes and exporting data for use in GIS from Social Atlas, please check out this short video tutorial.

Disengaged youth is defined as those residents aged 15-24 years who are not participating in education or employment. A neighbourhood with a high percentage of disengaged youth is interesting, but it may not be a statistically significant hotspot. To be a statistically significant hotspot, the neighbourhood will have a high proportion of disengaged youth and be surrounded by other neighbourhoods with a high proportion of disengaged youth as well.

Below is an example of what a thematic map of disengaged youth looks like for Campbelltown City in New South Wales. We can see where neighbourhoods with high proportions of disengaged youth are: Airds, parts of Ambarvale, Claymore, Macquarie Fields.

 

A thematic map of disengaged youth in Campbelltown City using natural breaks. A great and low time cost way of understanding spatial patterns

Thematic_analysis_map_Campbelltown

 

Understanding the results

A hotspot analysis of disengaged youth in Campbelltown City using the Getid-Ord-Gi* statistic (usually written as a Gi* statistic and pronounced “Gi-Star”), results in a map of hotspots and coldspots.

These descriptive definitions are a presentable way of showing the significance level (p-value, probability) and critical value (z-score, standard deviations) outputs of the Gi* analysis. A hotspot indicates clustering, i.e. a higher proportion of disengaged youth here than if randomly distributed. A coldspot indicates dispersion, i.e. a much lower concentration of disengaged youth here than if the data was random.

If we only choose the areas with a 95% or 99% hotspot confidence (those areas which have a p-value of 0.05, 0.01 or less, with a z-score of +1.96/2.58 or more), we focus on areas where there is a 95% or 99% chance that the hotspot we’ve identified is not a random spatial pattern. In this example, the hotspots appear in: Airds, Rosemeadow and Ambervale, parts of Claymore and Macquarie Fields.

Hotspot analysis output illustrating areas where there are statistically significant hotspots and coldspots of disengaged youth

Hotspot_analysis_map_Campbelltown

We can also identify coldspots, areas where the occurrence of a phenomenon is significantly lower than it would be if the data were randomly distributed. Coldspots, in this case, are areas such as Campbelltown, especially the triangle-shaped SA1 around the Western Sydney University (young people studying in a concentrated area is definitely going to be a coldspot when assessing disengaged youth).

Other neighbourhoods which appear as dispersions/coldspots are in Bow Bowing and Macquarie Links. These happen to be some of the least disadvantaged parts of the City of Campbelltown but even the temptation to make conclusions based on that relationship needs to be assessed statistically if we are to confidently say something like “the SEIFA index of disadvantage is correlated to disadvantaged youth in Campbelltown City”. We do this using the “Bivariate Moran’s I” analysis, a determinant of spatial autocorrelation between two variables.

Using hotspot analysis in your work

With this information, we can now more confidently use the results in our youth engagement policies, engagement with communities and resource allocation to certain suburb/neighbourhood community groups which encourage our young residents to be more involved in employment or the education system.

Not everyone is familiar with this kind of analysis, but it is very effective and statistically powerful, so if you are planning on making decisions based on the spatial distribution of a certain part of your population, whether it be disengaged youth, low-income households, residents with a need for assistance due to disability, recent arrivals, or anything else – contact us and we will help you get the most out of your data.

Nenad Petrović

Nenad’s background is in geosciences and geographic information systems. At .id, Nenad has experience as both as a demographer and population forecaster. His areas of expertise are place-based analysis, identifying spatial patterns in demographic trends, community profiling, catchment analysis and an understanding of role and function of different communities.

Leave a Reply