New in CAPE: Five different ways of finding similar councils

One of the things we want CAPE,  the Climate Action Plan Explorer, to do is make useful comparisons between different councils, and help surface where councils are similar and might be able to learn from each other. 

The first go at this was a physical proximity tool, which highlighted neighbouring councils, but this can miss that adjacent councils might well have some different circumstances. As announced in December’s month notes, we’ve now expanded this tool so that it offers five possible ways of seeing which councils are similar to each other. 

New CAPE page showing similar councils

One of the approaches we’ve been exploring is the use of BEIS carbon emissions data to provide an alternate lens, where councils can be shown to be ‘similar’ on the basis of the overall profile of their different kinds of emissions. 

As part of this process we created a prototype using binder and wrote a public blog post to gather feedback (and had a few good Twitter conversations about problems with specific comparisons). We showed the tool to climate officers directly, and also asked a larger group of climate officers and other participants at a session in NetZeroLocal about what aspects they would find useful in comparisons. We also talked to Connected Places Catapult, which is exploring a very similar approach in its Net Zero Navigator

Generally people were supportive of the idea of making comparisons based on emissions, but raised the point that it might be less or more useful depending on the kind of policy that was being compared.  

In the NetZeroLocal session there was broadly a lot of support for urban/rural splits and physical proximity and population size, with then lower support for a range of other options. This included low interest in the abstract idea of the carbon comparison, although in practice this effectively works as an urban/rural split classification. 

People also suggested additional datasets for specific kinds of problems. For instance, the rural and urban divide is useful across a whole range of factors, but housing stock would be useful for understanding a comparison of specific policy areas. 

The lesson from this is that one single ‘similar’ measure was not going to be good enough. Different kinds of problems require different kinds of comparisons, so we need a framework that can let people choose the comparison they want to make, and datasets that help them make good comparisons.  As such, CAPE now works with an improved version of the emissions comparison, but also three other measures, and a composite measure that uses all of these to give an impression of general similarity. This can be used to explore councils, or to limit a text search of plans just to similar councils. 

Improving emissions comparison

When we first started looking at emissions data, it quickly became clear there was a set of questions around understanding what the data meant in the first place, whether there was a “correct” way to manipulate it, and then how to describe what it meant at the end. 

The original uncertainty about whether it is correct to use ‘per person’ emissions is especially clear for industrial emissions – which have no clear relationship to the number of people in an area. Adjusting by the number of people leads to mid-industry but low-density areas being seen as comparable to very high-industry, high-density areas, which did not seem correct for comparisons.  In general, very high- and low-density areas make outliers and for odd clustering. Small authorities in Scotland ended up paired with the centre of London. This affects a small number of councils, but probably reflects patterns that are less obvious (and probably unhelpful) throughout the approach. 

There are several approaches to this problem. Connected Place Catapult uses local GDP rather than population as an alternative way of comparing industrial emissions between areas. Another approach would be to explicitly include population density as a dimension of the clustering. This should generally do little for most councils (as it is indirectly reflected in emissions), but should drive a wedge between incorrect comparisons in per person measures. Another option is to cap (winsorization) the population used to calculate per person. This should stop extreme outliers presenting bizarrely in comparisons without excluding them completely. 

For v2, we tested a few approaches and in the end used versions of all of these. The raw emissions data is adjusted in the following ways:

  • Domestic emissions are adjusted to be per person
  • Commercial and industrial to be per unit of GDP
  • Transport and Public Sector are per person, but winsorized.
  • A weighted down version of population density is used as an extra factor to push dissimilar councils a little bit further apart.

This produces clusters that are broadly similar to V1, but passes the test of not grouping a set of councils that seemed incorrectly grouped in the original. 

We also took a different approach to labelling these groups. Feedback was positive for the urban/rural distinction in V1, but this is now being taken care of more directly through a different approach. 

Given this, the labels for emissions data focus on which aspect of emissions the grouping has a higher than normal distribution of. While there are also times where a grouping has a below average amount of emissions for a particular type, this was hard to condense into a quick label (below average is not ‘low’) and is expanded on in the description.

Label Description
Industry/domestic/transport Above average for industry/domestic/transport, below average public sector emissions.
Public sector Well above average public sector (government, education, health), below average in other areas.
Urban mainstream Below average for most emissions scores.
Domestic Slightly above average in domestic emissions, below average public sector emissions.
Industry/commercial/public sector Above average industry/commercial and public sector.
City of London The City of London does not have a comparable emissions profile

 

Map of UK local authorities showing different emissions groups

For the moment we are not displaying these labels in the interface, but may use them as the basis for other forms of comparison in future.  In general, this process is inherently throwing away data to make comparisons easier, and so will always break down at some level of analysis. The solution to this is not to make the approach perfect, but to present multiple options that meet different use cases. 

New options

From feedback we learned that we couldn’t solve everyone’s problems with the same measure of similarity, so we’ve gone away and created a few new measures, and a framework where we could add more in future. 

This includes the previous measure of which councils geographically border or overlap with the selected council and introduces three new measures, deprivation profile, rural/urban profile and a composite overall comparison.  

Deprivation profile

The similarity between authorities is calculated by the proportion of the population living in high deprivation (1st quintile), medium deprivation (2nd-3rd quintile) and low deprivation (4th and 5th quintile) neighbourhoods. The population density is also used to help distinguish between authorities with very similar profiles of deprivation. 

This UK-wide comparison is based on a Composite Index of Multiple Deprivation system.

Rural/Urban profile

The similarity between authorities is calculated by the proportion of the population living in urban, rural, and highly rural neighbourhoods in an authority. The population density is also used to help distinguish between authorities in entirely urban areas. 

This UK-wide comparison is based on a Composite Rural Urban Classification system.

Overall comparison

There is also an overall comparison, which is the default view. This takes all of the above, and calculates which councils are nearest to each other along all these measurements. Councils may be shown as overall highly similar because they are very similar in one degree, or because they are slightly similar across several.

Our thinking in this was to create a single measure that was likely to be slightly useful in most cases, while giving more advanced users additional tools to dig into specific comparisons. 

The underlying datasets and code are available on Github