Even official records aren’t as safe as you might think they are. The archive of a country’s political history might be wiped out in a single conflagration.
Take the example of Burkina Faso, a beautiful West African country that is, sadly, perhaps best known to the rest of the world for its troubled political past.
The uprising in Burkina Faso in 2014 led to a fire in the National Assembly building and archives office. Nearly 90% of the documents were lost. Now the National Assembly is working to reconstruct the list of its parliament’s members before 1992.
This means that the data EveryPolitician has on Burkina Faso has nothing from terms before 1992. We’ve got some data for six of the seven most recent terms from the National Assembly so far, of which five are live on the site. Even though that data is not very rich (there’s little more than names in many cases; and the 6th term was transitional so data on that one’s membership might remain elusive) it’s a beginning.
We know from experience that data-gathering often proceeds piecemeal, and names are always a good place to start.
As Tinto finds new data, whether that’s more information about the politicians already collected or membership lists of the missing terms before 1992, we’ll be adding that to EveryPolitician too.
A vast collection
When people ask what EveryPolitician is, we often say, ‘The clue’s in the name’. EveryPolitician aims to provide data about, well … every politician. In the world.
(We’ve limited our scope — for the time being — to politicians in national-level legislatures).
The project is growing. Since our launch last year, we’ve got data for legislatures in 233 countries. The amount of data we’ve collected currently comprises well over three million items. The number of politicians in our datafiles is now in excess of 70,000.
Seventy thousand is an awful lot of politicians.
In fact, if you think that might be more politicians than the world needs right now, you’re right: as the Burkina Faso example shows, EveryPolitician collects historic data too.
So as well as the people serving in today’s parliaments, our data includes increasing numbers of those from the past. (Obviously, if you have such data for your country’s legislature, we’d love to hear from you!)
More than just today’s data
The Burkina Faso fire is an illustration of the value of collecting and preserving this historic data.
Of course, we’re fully aware of the usefulness of current data, because we believe that by providing it we can seed many other projects — including, but in no way limited to, parliamentary monitoring sites around the world (sites like our own TheyWorkForYou in the UK, or Mzalendo in Kenya, for example).
Nonetheless, we never intended to limit ourselves to the present. By sharing and collating historic records too, we hope to enable researchers, journalists, historians and who-knows-who-else to investigate, model, or reveal connections and trends over time that we haven’t even begun to imagine. We know this data has value; we look forward to discovering just how much value.
But it turns out we’re providing a simpler potential benefit too. EveryPolitician’s core datafiles are an excellent distributed archive.
What Burkina Faso’s misfortune goes to show is that, as historians know only too well, data sources can be surprisingly fragile.
In this case the specific situation involves paper records being destroyed by fire. That is a simple analogue warning to the digital world. Websites and their underlying databases are considerably more volatile than the most flammable of paper archives.
Database-backed sites are often poor catalogues of their pasts. Links, servers and domain registrations all expire. Access to data may be revoked, firewalls can appear.
Digital data doesn’t fade; instead it is so transient that it can simply disappear.
Of course, we cannot ourselves guarantee that our servers will be here forever (we’re not planning on going anywhere, but projects like this have to be realistic about the longer view).
There is an intriguing consequence of us using GitHub as our datastore. The fact is, the EveryPolitician data you can download isn’t coming off our servers at all. Instead, we benefit from GitHub’s industrial-scale infrastructure, as well as the distributed nature of the version control system, git, on which it is based. By its nature, every time someone clones the repository (which is easy to do), they’re securing for themselves a complete copy of all the data.
But the point is not necessarily about data persisting far into the next millennium — that’s a bit presumptuous even for us, frankly — so much as its robustness over the shorter cycles of world events. So, should any nation’s data become inaccessible (who knows? for the length of an interregnum or civil war, a natural disaster, or maybe just a work crew accidentally cutting through the wrong cable outside parliament), we want to know the core data will remain publicly available until it’s back.
Naturally there are other aspects to the EveryPolitician project which are more — as modern language would have it — compelling than collecting old data about old politicians. But the usefulness of the EveryPolitician project as a persistent archive of historical data is one that we have not overlooked.
mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.
Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.
First, some background on why we need our little bot.
Because there’s so much to do
One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:
- Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.
- Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.
- Politicians retire
Or die, or change their names or titles, or switch party or faction.
- New parties emerge
Or the existing ones change their names, or form coalitions.
- Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.
- Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.
- Someone in Russia updates the wikipedia page about a politician in Japan
Wikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)
- New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.
- New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.
So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.
We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.
The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.
So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.
We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.
The bot is already a few days into blogging — its first post was “I am a busy bot”, but you can see all the others on its own Medium page. You can also follow it on Twitter as @everypolitician. Of course, its true home, where all the real work is done, is the everypoliticianbot account on GitHub.
Images: CC-BY-SA from the EveryPolitician bot’s very own scrapbook.
- Legislatures change en masse, because of elections, etc.
As ever with Mozilla’s annual, hands-on festival, there was a lot going on in London’s Ravensbourne, a venue that’s especially conducive to mixing and meeting.
MozFest attracts an active and positive crowd of digital people, ranging from junior-school coder kids right through to hoary old digital campaigners. So we were delighted to meet up with old friends and make new ones, especially as some of them had travelled for afar to be there. London was fortunate once again to be hosting the event, since Mozilla is of course an international organisation. And as our main focus at this year’s event was EveryPolitician — “data about every national legislature in the world, freely available for you to use” — that international aspect was especially welcome.
As a result of our being there, we hope that lots more people know about EveryPolitician’s data, and that some of them are going to build or do amazing things with it. We’re still adding to our data, so we’d love your help: we have data on at least the current term of the top-level legislatures of most of the countries in the world. But we’d still love your help with finding good sources for the remaining few, as well as our ongoing task of going wider (adding more details about the politicians we do have) and deeper (adding historic data from previous terms).
If, in the spirit of digital do-ism that infuses MozFest, you do make something useful or funky with EveryPolitician’s data, do please let us know. We make sure all this lovely data is available to you in a consistent way (that not only means the delivery formats of CSV or JSON Popolo, but also that we adopt reliable conventions about the way we use them). This maximises the likelihood that, when you share that thing you’ve built using the data for your country, people in other places will be able to easily adopt it to work with the data for theirs. And that’s why, if you’ve made something amazing, we’d like to know — so we can shout about it.
Finally: thanks to the people who made MozFest run so smoothly this year, and the spirit of the open web. See you next year!
Image: Mozilla Festival CC BY 2.0
Ever feel sorry for the less popular kids at school?
Excellent, then you’re just the sort of person we need: you may empathise with some of the countries on Gender Balance that aren’t getting quite as much attention as the rest.
Thanks to our recent data drive, Gender Balance now contains many more countries, all waiting for you to play.
But we’ve noticed that some countries aren’t getting quite as much attention as others. Gender Balance’s ultimate aim is to provide data for researchers, and we’d hate to feel that we had patchier data for those studying the less popular places.
So, to encourage take-up, we’ve now added a ‘featured country’ spot. Accept the invitation to play the highlighted place, and you’ll receive double points, propelling you all the faster towards a coveted place on the Gender Balance leaderboard. Time to get playing!
Yesterday we told you how the data on EveryPolitician had expanded wildly in the last week. One side effect is that there are 64 new countries to play on Gender Balance.
Our gender classification game (read more about it here) runs on politician data from EveryPolitician, so by adding a whole bunch of countries, we also expanded Gender Balance’s range.
It also means that, as those countries get played, we’ll be gathering even more informative and useful data about the proportions of women to men in the world’s legislatures.
That’s all we have to say, except, 3,2,1… get playing!
Amazing—we did it!
When we decided to mark Global Legislative Openness Week with a drive to get the data for 200 countries up on EveryPolitician, in all honesty, we weren’t entirely sure it could be done.
And without the help of many people we wouldn’t have got there. But last night, we put live the data for North Korea and Sweden, making us one country over the target.
The result? There is now consistently-structured, reusable data representing the politicians in 201 countries, ready for anyone to pick up and work with. We hope you will.
That’s not to say that our job is over… far from it! There’s still plenty more to be done, as we’ll explain below.
Here’s how it happened
Getting the data for each country was a multi-step process, aided by many people. First, a suitable online source had to be located. Then, a scraper would be written: a piece of code that could visit that source and pull out the information we needed—names, districts, political parties, dates of office, etc—and put it all in the right format.
Because each country’s data had its own idiosyncrasies and formatting, we needed a different scraper for every country.
Once written, we added each scraper to EveryPolitician’s list. Crucially, scrapers aren’t just a one-off deal: ideally they’ll continue to work over time as legislatures and politicians change.
The map above shows our progress during GLOW week, from 134 countries, where we began, up to today’s count of 201.
mySociety’s Tony, Lead on the EveryPolitician project, worked non-stop this week to get as many countries as possible online. But this week we’ve seen EveryPolitician reach some kind of momentum, as it takes off as a community project. It’s an ambitious idea, and it can only succeed with the help of this kind of community effort. Thanks to everyone who helped, including (in no particular order):
Duncan Walker for writing the scraper for Uganda; Joshua Tauberer for helping with the USA data; Struan Donald for handling Ecuador, Japan, Hong Kong, Serbia and the Netherlands; Dave Whiteland, with ThaiNetizen helpfully finding the data source for Thailand; Team Popong for South Korean data; Jenna Howe for her work on El Salvador; Rubeena Mahato, Chris Maddock, Kätlin Traks, François Briatte, @confirmordeny, and @foimonkey for lots of help on finding data; Henare Degan and OpenAustralia who made the scraper for Ukraine; Matthew Somerville for covering the Falkland islands and Sweden; Liz Conlan for lots of help with Peru and American Samoa; Jaroslav Semančík who provided data for, and assistance with, Slovakia; Mathias Huter who supplied current data for Austria while Steven Hirschorn wrote a scraper for the historic data; Andy Lulham who wrote a scraper for Gibraltar; Abigail Rumsey who wrote a scraper for Sri Lanka; everyone who tweeted encouragement or retweeted our requests for help.
But there’s more
There are still 40 or so countries for which we have no data at all: you can see them here. This week has provided an enormous boost to our data, but the site’s real target is, just like the name says, to cover every politician in the world.
And once we’ve done that, there’s still the matter of both historic data, and more in-depth data for the politicians we do have. Thus far, we mostly have only the lower houses for most countries which have two — and for many countries we only have the current politicians. Going into the future we need to include much richer data on all politicians, including voting records, et cetera.
Meanwhile, our first target, to have a list of the current members of every national legislature in the world, is starting to look like it’s not so very far away. If you’d like to help us reach it, here’s how you still can.
Just how quickly can we hit the 200 countries mark on EveryPolitician? That’s what we’ll be finding out this week, and one thing’s for sure — we’ll get a lot further with your help.
This week is GLOW, the Global Legislative Openness Week, and we’re marking it with a concerted drive for more data.
Tony, the project lead, has consistently added one new country every day since EveryPolitician launched four months ago, and now it’s time to put a rocket behind our efforts.
The site currently contains data for 134 countries. We’ll be going flat out to see how quickly we can reach 200, and as the excitement ramps up, we hope you will help spread the word and get involved, too. Tony will carry on working as hard as he can to fill in the gaps, but we need your help to get further, faster.
What is EveryPolitician?
How can I help?
- Help us find data for more countries! We don’t currently know where to find the politician data for many countries. Here’s a list of the ones we need and here’s a page about how to contribute. If you get stuck, give us a shout.
- Write a scraper If you have the know-how, you can help us enormously by helping scrape the data from the places we do know about. See this page for guidance on how to go about writing a scraper. You’ll find lots of examples here.
- You can also help by spreading the word – tell your friends, tweet, blog, get up on a platform and talk, and just generally share this post. Thank you!
Why do we need this data?
Politician data is readily available for most countries, but it comes in a massive variety of inconsistent formats. Most of those formats aren’t ‘machine readable’, that is to say, the data can’t readily be extracted and re-used by programmers, and pretty much every country differs on what information it provides about each politician.
That being the case, anyone who wants to build an online tool that deals with politicians from more than one country, or who would like their tool to be available to people in other places, or would like to adapt an existing tool to be used elsewhere, would first have to adapt their tool to cope with the data.
EveryPolitician saves them the trouble, and the structured format also means that the tools they build will be compatible with any other tools that use it.
What kind of tools?
EveryPolitician data will be useful for all kinds of projects.
It’ll be much easier to build a website that shows people how to contact a politician. Or one that holds a government to account and educates people about what politicians are doing. Or one that helps voters make choices by displaying facts about what their politicians believe.
It can go further than that, though — with these building blocks in place, developers can really use their imagination to put together all kinds of projects, many of which we haven’t even begun to imagine. And don’t forget that, if a tool has been built to use the standardised data, it’ll also be easy for others to redeploy elsewhere.
If you’d like to see a concrete way in which the data’s already being used, check out Gender Balance.
How can I keep up to date?
We’ll be putting out regular updates via Twitter as the number of countries covered increases — plus you can watch the map turn green on http://everypolitician.org/countries.html as we progress.
As players were quick to notice, decisions made on our politician-sorting game Gender Balance were final. Thanks to volunteer coder Andy Lulham, that’s now been rectified with an ‘undo’ button.
Gender Balance is our answer to the fact that there’s no one source of gender information across the world’s legislatures—read more about its launch here. It serves up a series of politicians’ names and images, and asks you to identify the gender for each. Your responses, along with those of other players, helps compile a set of open data that will be available to all.
Many early players told us, however, that it’s all too easy to accidentally click the wrong button. (The reasons for this may be various, but we can’t help thinking that it’s often because there are so many males in a row that the next female comes as a bit of a surprise…)
In fact, this shouldn’t matter too much, because every legislature is served up to multiple players, and over time any anomalies will be ironed out of the data. That doesn’t stop the fact that it’s an upset to the user, though, and in the site’s first month of existence, an undo button has been the most-requested feature.
Thanks to the wonders of open source, anyone can take the code and make modifications or improvements, and that’s just what Andy did in this case. He submitted this pull request (if you look at that, you can see the discussion that followed with our own developers and our designer Zarino). We’ve merged his contribution back into the main code so all players will now have the luxury of being able to reverse a hasty decision. Thanks, Andy!
If you need data on the people who make up your parliament, another country’s parliament, or indeed all parliaments, you may be in luck.
What’s more, it’s all provided as Open Data to anyone who would like to use it to power a civic tech project. We’re thinking parliamentary monitoring organisations, journalists, groups who run access-to-democracy sites like our own WriteToThem, and especially researchers who want to do analysis across multiple countries.
But isn’t that data already available?
Yes and no. There’s no doubt that you can find details of most parliaments online, either on official government websites, on Wikipedia, or on a variety of other places online.
But, as you might expect from data that’s coming from hundreds of different sources, it’s in a multitude of different formats. That makes it very hard to work with in any kind of consistent fashion.
Every Politician standardises all of its data into the Popolo standard and then provides it in two simple downloadable formats:
- csv, which contains basic data that’s easy to work with on spreadsheets
- JSON which contains richer data on each person, and is ideal for developers
This standardisation means that it should now be a lot easier to work on projects across multiple countries, or to compare one country’s data with another. It also means that data works well with other Poplus Components.
What can I do with it?
Need a specific example? Yesterday, we introduced Gender Balance, the game that gathers data about women in politics.
As you’ll know if you’ve already given it a try, Gender Balance works by displaying politicians that make up one of the world’s legislatures, one by one.
That data all comes from Every Politician, and it’s meant that the developers have been able to concentrate on making a smooth and functional interface, knowing that the data side of things has already been taken care of.
That’s just one way to use Every Politician data, though. If you’d like to use it in your own site or app, you can find out more here.
We still need more data
As you may have noticed, there are more than 100 parliaments in the world. In fact, despite having reached what feels like a fairly substantial milestone, we’re still barely half way to getting some data for every parliament.
So we could use your help in finding data for the parliaments we don’t yet cover, and historic information for the ones we do. Read more about how you can help out.