Over the last few years we have stopped publishing several statistics on some of our services, but haven’t really talked publicly about why. This blog post is about the problems we’ve been trying to address and why, for the moment, we think less is better.
TheyWorkForYou numerology
TheyWorkForYou launched with explicit rankings of MPs but these were quickly replaced with more “fuzzy” rankings, acknowledging the limitations of the data sources available in providing a concrete evaluation of an MP. Explicit rankings on the ‘numerology’ section of an MPs profile were removed in 2006. In July 2020, we removed the section altogether.
This section covered the number of speeches in Parliament this year, answers to written questions, attendance at votes, alongside more abstract metrics like the reading age of the MP’s speeches, and ‘three-word alliterative phrases’ which counted the number of times an MP said phrases like ‘she sells seashells’.
This last metric intended to make a point about the limits of the data, along with a disclaimer that countable numbers reflect only part of an MP’s job.
Our new approach is based on the idea that, while disclaimers may make us feel we have adequately reflected nuance, we don’t think they are really read by users. Instead, if we do not believe data can help make meaningful evaluations or requires large qualifications, we should not highlight it.
Covid-19 and limits on remote participation also mean that the significance of some participation metrics is less clear for some periods. We’re open to the idea that some data may return in the future, if a clear need arises that we think we can fill with good information. In the meantime the raw information on voting attendance is still available on Public Whip.
WriteToThem responsiveness statistics
When someone sends a message to a representative through WriteToThem, we send a survey two weeks later to ask if this was their first time writing, and whether they got a response.
This answer was used annually to generate a table ranking MPs by responsiveness. In 2017 we stopped publishing the WriteToThem stats page. The concerns that led to this were:
- There are systemic factors that can make MPs more or less likely to respond to correspondence (eg holding ministerial office).
- As the statistics only cover the last year, this can lead to MPs moving around the rankings significantly, calling into question the value of a placement in any particular year. Does it represent improvement/decline, or is the change random?
- MPs receive different types of communication and may prioritise some over others (for example, requests for intervention rather than policy lobbying). Different MPs may receive different types of messages, making comparisons difficult.
- The bottom rankings may be reflecting factors outside MPs’control (eg a technical problem with the email address, or health problems), which can invalidate the wider value of the rankings.
The original plan was to turn this off temporarily while we explored how the approach could be improved, but digging into the complexity has led to the issue dragging on and at this point it is best to say the rankings are unlikely to return in a similar form any time soon.
The reasons for this come from our research on WriteToThem and the different ways we have tried to explore what these responsiveness scores mean.
Structural factors
There are structural factors that make direct comparisons between MPs more complicated. For instance, we found that when people write to Members of the Scottish Parliament there are different response rates for list and constituency members. What we don’t know is whether this reflects different behaviour in response to the same messages, or whether list and constituency MSPs were getting different kinds of messages, some of which are easier to respond to. Either way, this would suggest an approach where we judge these separately or need to apply a correction for this effect (and we would need to have different processes for different legislatures).
There are also collective factors that individual representatives do contribute to. For instance, if MPs from one party are more responsive to communication, controlling for this factor to make them easier to compare to other MPs individually is unfair as it minimises the collective effort. Individuals are part of parties, but also parliamentary parties are a collection of individuals. Clear divides are difficult in terms of allocating agency.
Gender
One of the other findings of our paper on the Scottish Parliament was that there was an effect in the Holyrood and Westminster Parliaments where female MPs had a systematically lower responsiveness score than male MPs (roughly 7% lower in both cases, and this remains when looking at parties in isolation). Is this a genuine difference in behaviour, or does it reflect a deeper problem with the data? While responsiveness scores are not quite evaluations it seems reasonable to be cautious in user-generated data that is systematically leading to lower rankings for women, especially when the relevant literature suggests that women MPs had spent more time on constituency service when the question was studied in the 1990s.
One concern was if abusive messages sent through the platform were leading to more emails not worth responding to. This was of special concern given online abuse against women MPs through other platforms. While WriteToThem only accounts for 1-2% of emails to MPs, it is a concern if we cannot rule out if a gendered difference in abusive messages is a contributor to a difference in a metric we would then use to make judgements about MPs.
Our research in this area has found some interaction between the gender of the writer and recipient of a message. We found a (small) preference for users to write to representatives who shared their gender, but without more knowledge of the content of messages we cannot really understand if the responsiveness difference results from factors that are fair or unfair to judge individual representatives on. Our policy that we should maintain the privacy of communications between people and their MP as much as possible means direct examination is not possible for research projects, and returning to publishing rankings without more work to rule this out would be problematic. We are exploring other approaches to understand more about the content of messages.
Content and needs
We could in principle adjust for differences that can be identified, but we also suspect there are other differences that we cannot detect and remove. For instance, constituents in different places have different types of problems, and so have different needs from their MP. If these different kinds of problems have different levels of responsiveness, what we are actually judging an MP on is their constituents, rather than their own behaviour.
A finding from our analysis of how the index of multiple deprivation (which ranks the country on a variety of different possible measures of deprivation) relates to data in WriteToThem is that messages to MPs from more deprived areas are less likely to get a response than those from less deprived areas. The least deprived decile has a response rate about 7% higher than the average and the most deprived decile is 6% lower. However, when looking at rates per decile per individual MP there is no pattern. This suggests this is a feature of different MPs covering different areas (with different distributions of deprivation), rather than individual MPs responding differently to their own constituents.
At the end of last year, we experimented with an approach that standardised the scores via a hypothetical average constituency. This was used by change.org as one metric among many in a People-Power index. While this approach addresses a few issues with the raw rankings, we’re not happy with it. In particular, there was an issue with an MP who was downgraded because more of their responses were in a more deprived decile, and this was averaged down by lower responses in higher deciles.
If we were to continue with that approach, a system that punishes better responsiveness to more deprived areas is a choice that needed a strong justification. This approach is also becoming more abstract as a measure, and less easy to explain what the ranking represents. Are we aiming to provide useful comparisons by which to judge an MP, or a guide to WriteToThem users as to whether they should expect a reply? These are two different problems.
We are continuing to collect the data, because it is an interesting dataset and we’re still thinking about what it can best be used for, but do not expect to publish rankings in their previous form again.
FixMyStreet and WhatDoTheyKnow
Other services are concerned with public authorities rather than individual representatives. In these cases, there is a clearer (and sometimes statutory) sense of what good performance looks like.
Early versions of FixMyStreet displayed a “league table”, showing the number of reports sent to each UK council, along with the number that had been fixed recently. A few years ago we changed this page so that it only lists the top five most responsive councils.
There were several reasons for this: FixMyStreet covers many different kinds of issues that take different amounts of time to address, and different councils have more of some of these issues than others. Additionally, even once a council resolves an issue, not all users come back to mark their reports as fixed.
As a result the information we have on how quickly problems are fixed may vary for reasons out of a council’s control. And so while we show a selection of the top five “most responsive” councils on our dashboard page, as a small way of recognising the most active councils on the site, we don’t share responsiveness stats for all councils in the UK. More detail on the difference in the reported fix rate between different kinds of reports can be seen on our data explorer minisite.
WhatDoTheyKnow similarly has some statistics summary pages for the FOI performance of public authorities. We are reviewing how we want to generate and use these stats to better reflect our goals of understanding and improving compliance with FOI legislation in the UK, and as a model for our partner FOI platforms throughout the world.
In general, we want to be confident that any metric is measuring what we want to measure, and we are providing information to citizens that is meaningful. For the moment that means publishing slightly less. In the long run, we hope this will lead us to new and valuable ways of exploring this data.