We’ve used machine learning to make practical improvements in the search on CAPE – our local government climate information portal.
The site contains hundreds of documents and climate action plans from different councils, and they’re all searchable.
One aim of this project is to make it easier for everyone to find the climate information they need: so councils, for example, can learn from each other’s work; and people can easily pull together a picture on what is planned across the country.
The problem is that these documents often use different terms to talk about the same basic ideas – meaning that using the search function requires an expert understanding of which different keywords to search for in combination.
Using machine learning, we’ve now made it so the search will automatically include related terms. We’ve also improved the accessibility of individual documents by highlighting which key concepts are discussed in the document.
How machine learning helps
We’re already using machine learning techniques as part of our work clustering similar councils based on emissions profile, but we hadn’t previously looked at how machine learning approaches could be applied to big databases of text like CAPE.
As part of our funding from Quadrature Climate Foundation, we were supported to take part in the Faculty Fellowship – where people transitioning from academic to industrial data science jobs are partnered with organisations looking to explore how machine learning can benefit their work.
Louis Davidson joined us for six weeks as part of this programme. After a bit of exploration of the data, we decided on a project looking at this problem of improving the search, as there was a clear way a machine learning solution could be applied: using a language model to identify key concepts that were present across all the documents. You can watch Louis’ end of project presentation on YouTube.
Moving from similar words to similar concepts
Louis took the documents we had and used a language model (in this case, BERT) to produce ‘embeddings’ for all the phrases they contained.
When language models are trained on large amounts of text, this changes the internal shape of the model so that text with similar meanings ends up being ‘closer’ to each other inside the model. An ‘embedding’ is a series of numbers that represent this location. By looking at the distance between embeddings, we can identify groups of similar terms with similar meanings. While a more basic text similarity approach would say that ‘bat’ and ‘bag’ are very similar, a model that sorts based on meaning would identify that ‘bat’ and ‘owl’ are more similar.
This means that without needing to re-train the model (because you’re not really concerned with what the model was originally trained to do), you can explore the similarities between concepts.
There are approaches to this that store a “vector database” of these embeddings which can be directly searched – but we’ve gone for a simpler approach that doesn’t require a big change to how CAPE was already working.
Using the documents we have, we automatically identified (and manually selected a group of) common concepts that are found across a range of documents – and the original groups of words that relate to those concepts.
When a search is made we now consult this list of similar phrases, and search for these at the same time. This gives us a practical way of improving our existing processes without adding new technical requirements when adding new documents or searching the database.
Because we now have this list of common concepts, we are also pre-searching for these concepts to provide, for each document, links to where that concept is discussed within it. With this change, the contents of individual documents are more visible, with it easier to quickly identify interesting contents depending on what you are interested in.
Potential of machine learning for mySociety
Our other websites, like TheyWorkForYou and WhatDoTheyKnow, similarly have a large amount of text that this kind of semantic search can make more accessible — and we can already see how they might be useful to those relying on data around climate and the environment WhatDoTheyKnow in particular has huge amounts of environmental information fragmented across replies to hundreds of different authorities.
Generative AI and machine learning have huge potential to help us make the information we hold more accessible. At the same time, we need to understand how to incorporate new techniques into our services in a way that is sustainable over time.
Through experiments like this with CAPE, we are learning how to think about machine learning, which problems we have that it applies to, and understand new skills we need to work with it. Thanks to Louis, and his Faculty advisors for his work and their support on this project.
Sign up for climate updates from mySociety
Image: Ravaly on Unsplash.
As we barrel into Summer at full speed, here’s a summary of what mySociety’s climate team got up to in May.
Neighbourhood Warmth: alpha testing a vision of community-powered retrofit
As Siôn blogged a few days ago, Neighbourhood Warmth has been, and will continue to be, a major focus for us over May–July this year.
Last month, we grappled with some thorny design questions (how do we test appetite for community-led retrofit? how could a service support both climate activists and neighbours who just need lower energy bills?) and started building a working alpha, which we’ll be testing out in online workshops with a handful of pilot communities around the UK this June/July.
We also had a number of really encouraging calls with other organisations working in this space – all of us keen on finding some way to square the circle of solving the UK’s massive domestic decarbonisation challenge. If you’re interested, you can read much more in Siôn’s seprate monthnotes for this project.
CAPE: making sense of messy data around local authorities’ climate plans
From our newest climate tool (Neighbourhood Warmth) to our longest running – CAPE. This May we progressed two big improvements to CAPE, which we’re hoping to deploy and test out in June/July.
The first uses AI / machine learning to extract clusters of related topics from our database of every local authority climate action plan in the UK, so you can more find other plans which mention topics close to your heart. We’re hoping these auto-extracted topics will also make it easier to quickly see what’s inside a document, without reading it from head to foot.
The second change is a big re-think of how we help local authorities find their “climate twins”, or other councils likely to face similar climate challenges. We’re in the early stages of this little mini-project, but I’m excited that we might be able to come up with something that really brings together all of the various datapoints CAPE holds on each council, in a way that you just can’t get anywhere else. More on this, hopefully, in our June or July monthnotes!
Council Climate Action Scorecards: crowdsourcing and verifying council actions on climate
May saw the end of the “Right of Reply” period for councils to contribute their feedback on Climate Emergency UK’s volunteer assessors’ analysis of their climate actions. All of this marking and feedback process has been handled through a webapp custom built by mySociety, and it’s encouraging to see that oiver 80% of local authorities in the UK logged into the site to check their score, and around 70% of local authorities provided feedback on their provisional marks!
We’re really proud of how this year’s Council Climate Action Scorecards are shaping up, and can’t wait to start sharing them in the Autumn. Our partners, Climate Emergency UK, have put a huge effort into making these as fair and up-to-date a representation of actual local authority action on climate change. Now they enter their final “Audit” phase, consolidating councils’ feedback against the volunteers’ first marks, after which we’ll be able to calculate each council’s final score.
Local Intelligence Hub: a treasure-trove of constituency-level climate data
The Local Intelligence Hub—the face of our collaboration with The Climate Coalition—soft launched to Climate Coalition members at the end of April. But just because the site is now in the hands of members, doesn’t mean work stops! Alexander has been continuing to collect and import new datasets around fuel poverty, the cost of living, and child poverty – as well as improving the reliability of advanced features like shading constituencies on the map. Meanwhile, our other Alex has been grappling with some Google Analytics-related challenges (tracking Custom Events with cookie-less GA4 – one for the geeks!) which I’m sure he’ll blog about in due course.
If you’re part of an organisation in The Climate Coalition, you can request a free account on the Local Intelligence Hub, and try out the tools and datasets for yourself. For everyone else, we’re still hoping to launch a public version of the tool later this year.
Header image: Krista
Artificial intelligence and machine learning seem to be everywhere at the moment – every day there’s a new story about the latest smart assistant, self-driving car or the impending take over of the world by robots. With FixMyStreet having recently reached one million reports, I started wondering what kind of fun things could be done with that dataset.
Inspired by a recent post that generated UK place names using a neural network, I thought I’d dip my toes in the deep learning sea and apply the same technique to FixMyStreet reports. Predictably enough the results are a bit weird.
I took the titles from all the public reports on fixmystreet.com as the training data, and left the training process to run overnight. The number crunching was pretty slow and the calculations had barely reached 5% in the morning. I suspect the training set was a bit too large, at over 1M entries, but end result still gives enough to work with.
The training process produces checkpoints along the way, which you can use to see how the learning is progressing. After 1000 iterations the model was starting to be aware that it should use words, but didn’t really know how to spell them:
Mertricolbes Ice does thrown campryings Sunky riking proper, badger verwappefing cars off uping is! Finst Knmp Lyghimes Jn fence Moadle bridge is one descemjop
After 15000 iterations it’s starting to get the hang of real words, though still struggling to form coherent sentences.
Untaxed cacistance. Broken Surface in ARRUIGARDUR. Widdy movering Cracked already nail some house height avenue. Light not worky I large pot hole Dumped shood road nod at street. Grim Dog man Ongorently obstructing sofas. This birgs. Serious Dirches
After 68000 iterations there seems to be enough confusion in the training data that things start to go south again with the default parameters:
Urgely councille at jnc swept arobley men. They whention to public bend to street? For traffic light not working
Tweaking the ‘temperature’ of the sampling process produces increasingly sensible results:
Large crumbling on pavement Potholes all overgrown for deep pothole Very van causing the road Very deep potholes on pavement Weeds on the pavement Several potholes in the road Rubbish Dumped on the road markings Potholes on three away surface blocking my peride garden of the pavement Potholes and rubbish bags on pavement Poor road sign damaged Poor street lights not working Dog mess in can on road bollard on pavement A large potholes and street light post in middle of road
As well as plenty of variations on the most popular titles:
Pot hole Pot hole on pavement Pot holes and pavement around Pot holes needings to path Pothole Pothole dark Pothole in road Pothole/Damaged to to weeks Potholes Potholes all overgrown for deep pothole Potholes in Cavation Close Potholes in lamp post Out Potholes in right stop lines sign Potholes on Knothendabout Street Light Street Lighting Street light Street light fence the entranch to Parver close Street light not working Street light not working develter Street light out opposite 82/00 Tood Street lights Street lights not working in manham wall post Street lights on path Street lights out
It also seems to do quite well at making up road names that don’t exist in any of the original reports (or in reality):
Street Light Out - 605 Ridington Road Signs left on qualing Road, Leave SE2234 4 Phiphest Park Road Hasnyleys Rd Apton flytipping on Willour Lane The road U6!
Here are a few of my favourites for their sheer absurdity:
Huge pothole signs Lack of rubbish Wheelie car Keep Potholes Mattress left on cars Ant flat in the middle of road Flytipping goon! Pothole on the trees Abandoned rubbish in lane approaching badger toward Way ockgatton trees Overgrown bush Is broken - life of the road. Poo car Road missing Missing dog fouling - under traffic lights
Aside from perhaps generating realistic-looking reports for demo/development sites I don’t know if this has any practical application for FixMyStreet, but it was fun to see what kind of thing is possible with not much work.
Your donations keep our sites running.Donate now