I’m just a few weeks into my position of Research Associate at mySociety and one of the things I’m really enjoying is the really, really interesting datasets I get to play with.
Take FixMyStreet, the site that allows you to report street issues anywhere in the UK. Councils themselves will only hold data for the issues reported within their own boundaries, but FixMyStreet covers all local authorities, so we’ve ended up with probably the most comprehensive database in the country. We have 20,000 reports about dog poop alone.
Now if you’re me, what to do with all that data? Obviously, you’d want to do something with the dog poop data. But you’d try something a bit more worthy first: that way people won’t ask too many questions about your fascination there. Misdirection.
How does it compare?
So, starting with worthy uses for that massive pile of data, I’ve tried to see how the number of reports in an area compares against other statistics we know about the UK. Grouping reports into ONS-defined areas of around 1,500 people, we can match the number of reports within an area each year against other datasets.
To start with I’m just looking at English data (Scotland, Wales and Northern Ireland have slightly different sets of official statistics that can’t be combined) for the years 2011-2015. I used population density information, how many companies registered in the area, if there’s a railway station, OFCOM stats on broadband and mobile-internet speeds, and components from the indices of multiple deprivation (various measures of how ‘deprived’ an area is, such as poor health, poor education prospects, poor air quality, etc) to try and build a model that predicts how many reports an area will get.
The good news: statistically we can definitely say that some of those things have an effect! Some measures of deprivation make reports go up, others make it go down. Broadband and mobile access makes them go up! Population density and health deprivation makes them go down.
The bad news: my model only explains 10% of the actual reports we received, and most of this isn’t explained by the social factors above but aspects of the platform itself. Just telling the model that the platform has got more successful over time, which councils use FixMyStreet for Councils for their official reporting platform (and so gather more reports) and where our most active users are (who submit a disproportionate amount of the total reports) accounts for 7-8% of what the model explains.
What that means is that most reasons people are and aren’t making reports is unexplained by those factors. So for the moment this model is useful for building a theory, but is far from a comprehensive account of why people report problems.
Here’s my rough model for understanding what drives areas to submit a significantly higher number of reports to FixMyStreet:
- An area must have a problem
Measures of deprivation like the ‘wider barriers to housing deprivation’ metric (this includes indicators on overcrowding and homelessness) as well as crime are associated with an increase in the number of reports. The more problems there are, the more likely a report is — so deprivation indicators we’d imagine would go alongside other problems are a good proxy for this.
- A citizen must be willing or able to report the problem
Areas with worse levels of health deprivation and adult skills deprivation are correlated with lower levels of reports. These indicators might suggest citizens less able to engage with official structures, hence fewer reports in these areas.
People also need to be aware of a problem. The number of companies in an area, or the presence of a railway station both increase the number of reports. I use these as a proxy for foot-traffic – where more people might encounter a problem and report it.
Population density is correlated with decreased reports which might suggest a “someone else’s problem” effect – a slightly decreased willingness to report in built-up areas where you think someone else might well make a report.
- A citizen must be able to use the website
As an online platform, FixMyStreet requires people to have access to the website before they can make a report. The less friction in this experience makes it more likely a report will be made.
This is consistent with the fact that an increased number of slow and fast home broadband connections (and fast more than slow ones) increases reports. This is also consistent with the fact that increased 3G signal in premises is correlated with increased requests.
Reporting problems on mobile will sometimes be easier than turning on the computer, and we’d expect areas where people more habitually use mobiles for internet access to have a higher number of reports than broadband access alone would suggest. If it’s slightly easier, we’d expect slightly more – which is what this weak correlation suggests.
Not all variables my model includes are significant or fit neatly into this model. These are likely working as proxy indicators for currently unaccounted for, but related factors.
I struggle, for instance, to come up with a good theory why measures of education deprivation for young people are associated with an increase in reports. I looked to see if there was a connection between an area having a school and having more reports on the basis of foot-traffic and parents feeling protective over an area – but I didn’t find an effect for schools like I did for registered companies.
So at the moment, these results are a mix of “a-hah, that makes sense” and “hmm, that doesn’t”. But given that we started with a dataset of people reporting dog poop, that’s not a terrible ratio at this point. Expanding the analysis into Scotland and Wales, analysing larger areas, or focusing on specific categories of reports might produce models that explain a bit more about what’s going on when people report what’s going wrong.
I’ll let you know how that goes.
Image: Dave_S (CC-by/2.0)