mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.
Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.
First, some background on why we need our little bot.
Because there’s so much to do
One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:
- Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.
- Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.
- Politicians retire
Or die, or change their names or titles, or switch party or faction.
- New parties emerge
Or the existing ones change their names, or form coalitions.
- Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.
- Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.
- Someone in Russia updates the wikipedia page about a politician in Japan
Wikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)
- New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.
- New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.
So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.
We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.
The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.
So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.
We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.
The bot is already a few days into blogging — its first post was “I am a busy bot”, but you can see all the others on its own Medium page. You can also follow it on twitter as @everypolitbot. Of course, its true home, where all the real work is done, is the everypoliticianbot account on GitHub.
Images: CC-BY-SA from the EveryPolitician bot’s very own scrapbook.
- Legislatures change en masse, because of elections, etc.
Amazing—we did it!
When we decided to mark Global Legislative Openness Week with a drive to get the data for 200 countries up on EveryPolitician, in all honesty, we weren’t entirely sure it could be done.
And without the help of many people we wouldn’t have got there. But last night, we put live the data for North Korea and Sweden, making us one country over the target.
The result? There is now consistently-structured, reusable data representing the politicians in 201 countries, ready for anyone to pick up and work with. We hope you will.
That’s not to say that our job is over… far from it! There’s still plenty more to be done, as we’ll explain below.
Here’s how it happened
Getting the data for each country was a multi-step process, aided by many people. First, a suitable online source had to be located. Then, a scraper would be written: a piece of code that could visit that source and pull out the information we needed—names, districts, political parties, dates of office, etc—and put it all in the right format.
Because each country’s data had its own idiosyncrasies and formatting, we needed a different scraper for every country.
Once written, we added each scraper to EveryPolitician’s list. Crucially, scrapers aren’t just a one-off deal: ideally they’ll continue to work over time as legislatures and politicians change.
The map above shows our progress during GLOW week, from 134 countries, where we began, up to today’s count of 201.
mySociety’s Tony, Lead on the EveryPolitician project, worked non-stop this week to get as many countries as possible online. But this week we’ve seen EveryPolitician reach some kind of momentum, as it takes off as a community project. It’s an ambitious idea, and it can only succeed with the help of this kind of community effort. Thanks to everyone who helped, including (in no particular order):
Duncan Walker for writing the scraper for Uganda; Joshua Tauberer for helping with the USA data; Struan Donald for handling Ecuador, Japan, Hong Kong, Serbia and the Netherlands; Dave Whiteland, with ThaiNetizen helpfully finding the data source for Thailand; Team Popong for South Korean data; Jenna Howe for her work on El Salvador; Rubeena Mahato, Chris Maddock, Kätlin Traks, François Briatte, @confirmordeny, and @foimonkey for lots of help on finding data; Henare Degan and OpenAustralia who made the scraper for Ukraine; Matthew Somerville for covering the Falkland islands and Sweden; Liz Conlan for lots of help with Peru and American Samoa; Jaroslav Semančík who provided data for, and assistance with, Slovakia; Mathias Huter who supplied current data for Austria while Steven Hirschorn wrote a scraper for the historic data; Andy Lulham who wrote a scraper for Gibraltar; Abigail Rumsey who wrote a scraper for Sri Lanka; everyone who tweeted encouragement or retweeted our requests for help.
But there’s more
There are still 40 or so countries for which we have no data at all: you can see them here. This week has provided an enormous boost to our data, but the site’s real target is, just like the name says, to cover every politician in the world.
And once we’ve done that, there’s still the matter of both historic data, and more in-depth data for the politicians we do have. Thus far, we mostly have only the lower houses for most countries which have two — and for many countries we only have the current politicians. Going into the future we need to include much richer data on all politicians, including voting records, et cetera.
Meanwhile, our first target, to have a list of the current members of every national legislature in the world, is starting to look like it’s not so very far away. If you’d like to help us reach it, here’s how you still can.
Just how quickly can we hit the 200 countries mark on EveryPolitician? That’s what we’ll be finding out this week, and one thing’s for sure — we’ll get a lot further with your help.
This week is GLOW, the Global Legislative Openness Week, and we’re marking it with a concerted drive for more data.
Tony, the project lead, has consistently added one new country every day since EveryPolitician launched four months ago, and now it’s time to put a rocket behind our efforts.
The site currently contains data for 134 countries. We’ll be going flat out to see how quickly we can reach 200, and as the excitement ramps up, we hope you will help spread the word and get involved, too. Tony will carry on working as hard as he can to fill in the gaps, but we need your help to get further, faster.
What is EveryPolitician?
How can I help?
- Help us find data for more countries! We don’t currently know where to find the politician data for many countries. Here’s a list of the ones we need and here’s a page about how to contribute. If you get stuck, give us a shout.
- Write a scraper If you have the know-how, you can help us enormously by helping scrape the data from the places we do know about. See this page for guidance on how to go about writing a scraper. You’ll find lots of examples here.
- You can also help by spreading the word – tell your friends, tweet, blog, get up on a platform and talk, and just generally share this post. Thank you!
Why do we need this data?
Politician data is readily available for most countries, but it comes in a massive variety of inconsistent formats. Most of those formats aren’t ‘machine readable’, that is to say, the data can’t readily be extracted and re-used by programmers, and pretty much every country differs on what information it provides about each politician.
That being the case, anyone who wants to build an online tool that deals with politicians from more than one country, or who would like their tool to be available to people in other places, or would like to adapt an existing tool to be used elsewhere, would first have to adapt their tool to cope with the data.
EveryPolitician saves them the trouble, and the structured format also means that the tools they build will be compatible with any other tools that use it.
What kind of tools?
EveryPolitician data will be useful for all kinds of projects.
It’ll be much easier to build a website that shows people how to contact a politician. Or one that holds a government to account and educates people about what politicians are doing. Or one that helps voters make choices by displaying facts about what their politicians believe.
It can go further than that, though — with these building blocks in place, developers can really use their imagination to put together all kinds of projects, many of which we haven’t even begun to imagine. And don’t forget that, if a tool has been built to use the standardised data, it’ll also be easy for others to redeploy elsewhere.
If you’d like to see a concrete way in which the data’s already being used, check out Gender Balance.
How can I keep up to date?
We’ll be putting out regular updates via Twitter as the number of countries covered increases — plus you can watch the map turn green on http://everypolitician.org/countries.html as we progress.
Party Conference season is upon us again, and, with it, a new set of fine promises and rhetorical flourishes, as each party’s top dogs take the podium. But what happens to those pledges, vows and forecasts once the banners are taken down and the party faithful turn for home?
Cast your mind back to November 2013, and you may recall that there was bit of a fuss about the fact that the Conservative party had removed old speeches from their website.
Not just that, but they’d also effectively erased them from the places where you can commonly find retired internet content… unless you really know where to look.
Was it a sinister rewriting of history, or a simple spring clean of elderly content? Well, that depends who you believe – but here at mySociety, we do think that you should be able to hold political parties to account for promises they made in the past.
Not only that, but we happen to have a splendid tool for publishing the spoken word: SayIt.
So we thought we’d track down that missing content and put it online for anyone to search and browse. And because we are a wholly non-partisan organisation, we did the same for Labour.
Note: we’re not intending to update these collections regularly – it’s a one-off initiative, designed to fill a gap in the public archive. And within the confines of this project, we’ve only published Labour and Conservative speeches.
On the other hand, if you’re interested in setting up similar sites for the other parties, or even taking over these ones, SayIt is very simple for anyone to use: just get in touch.Image by Klaus Riesner (CC)