Earlier this month, Mark laid out the concept of a Democratic Commons for the Civic Tech community: shared code, data and resources where anyone can contribute, and anyone can benefit.
He also talked about exploring new models for funding the kind of work that we do in our Democracy practice at mySociety.
For many years, our Better Cities work has been proof of concept for one such model: we provide data and software as a service (FixMyStreet, MapIt, Mapumental) to paid clients, the revenue from which then funds our charitable projects. Could a similar system work to sustain our Democracy practice?
That’s the hope, and with Facebook who we first worked with during the UK General Election in June, providing the data that helped people see and connect with their elected representatives, we’ve already seen it in action.
This kind of project is positive on multiple levels: it brings us an income, it brings the benefits of democratic engagement to a wider audience than we could reach on our own, and it contributes data back into EveryPolitician and Wikidata, that everyone can use.
The UK election was only the first for which we did this work: we’ve gone on to provide the same service for the French elections and more recently for the rather more eventful Kenyan ones — currently on hold as we await the re-run of the Presidential election next month. And now we’re doing the same for the German elections, where candidate data is being shared this week.
As we’re learning, this is definitely not one-size-fits-all work, and each country has brought its own interesting challenges. We’re learning as we go along — for example, one significant (and perhaps obvious) factor is how much easier it is to work with partners in-country who have a better understanding of the sometimes complex political system and candidates than we can ever hope to pick up. Much as we might enjoy the process, there’s little point in our spending days knee-deep in research, when those who live in-country can find lists of candidates far more quickly, and explain individual levels of government and electoral processes far better.
Then, electoral boundaries are not always easy to find. We’ve used OpenStreetMap where possible, but that still leaves some gaps, especially at the more granular levels where the data is mainly owned and licensed by the government. It’s been an exercise in finding different sources and putting them all together to create boundary data to the level required.
Indeed, that seems to be a general pattern, also replicated across candidate data: at the national level, it’s easy to find and in the public domain. The deeper you go, the less those two descriptors hold true. It was also at this point that we realised how much, here in the UK, we take for granted things like the fact that the spelling of representatives’ names is usually consistent across a variety of sources — not always a given elsewhere, and currently something that only a human can resolve!
What makes all the challenges more worthwhile, though, is that we know it’s not just a one-off push that only benefits a single project. Nor is the data going straight into Facebook, never to be seen again.
Much of what we’re collecting, from consistent name data to deep-level boundaries data, is to be made available to all under open use licenses. For example, where possible we can submit the boundaries back to OpenStreetMap, helping to improve it at a local granular level across whole countries.
The politician data, meanwhile, will go into Wikidata and EveryPolitician so that anyone can use it for their own apps, websites, or research.
There are also important considerations about how this type of data will be used and where and when it is released in the electoral process; finding commercial models for our Democracy work is arguably a more delicate exercise than on some of our other projects. But hopefully it’s now clear exactly how a project like this can both sustain us as a charity, and have wider benefits for everyone — the holy grail for an organisation like us.
At the moment it’s unclear how many such opportunities exist or if this is a one-off. We’re certainly looking for more avenues to extend the scope of this work and keen to hear more ideas on this approach.
Your contributions help us keep projects like EveryPolitician up and running, for the benefit of all.Donate now
Last Saturday (August 19th) at Newspeak House in London, mySociety and Wikimedia UK held the “Wikifying Westminster” workshop, a day-long event to encourage people to get involved with Wikidata, but also to give a taste of what people can build with the data that is already there.
The vision: one day, complex investigations which currently take researchers a lot of time, such as “how many MPs are descended from people who were also MPs” or “how many people named X were MPs in year Y”, will be answerable with data from Wikidata using a single SPARQL query…
…but we’re not quite there yet. Currently, some data is scattered all over separate databases (which sometimes get shut down or disappear); some is just plain missing; and most frustrating of all, some is in place but there’s no apparent way to get it out of the database.
In order to make this vision a reality, we need to experiment with the data, find ways to check how complete it is, and explore what questions we can currently answer with it. Events like Wikifying Westminster are the perfect opportunity to do just that.
After a brief introduction to Wikidata and the EveryPolitician project, we split into two groups: one focused on learning how to use Wikidata, while the other focused on working on mini-projects.
Here’s a taste of what happened…
The learning track began by introducing new users to the basic Wikidata editing principles (or “getting data into Wikidata”). Participants were able to put their new skills into action immediately, by adding missing data on British MPs, who were mostly lacking dates and places of birth.
By the end of the first session, good progress had been made, particularly on obtaining dates of birth for current British parliamentarians. For some reason, though, it proved much harder to find these for women than for men: we can only speculate as to why that might be (do some still adhere to the idea that a woman shouldn’t reveal her age?!).
We were also given an introduction to SPARQL, a language used to query information on databases (or “getting data out of Wikidata”). Lucas Werkmeister introduced the Wikidata query service and explained a few tricks to help with using it. Participants were later able to put this to the test by running progressively difficult test queries such as “All current UK MPs” or “Who is the youngest current MP?”
Also, Navino Evans showed us the potential of reusing data, talking about Histropedia, which he co-created with Sean McBirnie. Histropedia is an awesome tool that lets you visualise thousands of topics on interactive timelines: you can browse through existing ones or create a new one from scratch.
This group both worked on improving data and looked at how well we could answer some simple “stepping stone” queries (i.e. small questions to which we already knew some of the answers) as a heuristic of how good the data in Wikidata already is. You can see and contribute questions to the list of test queries here.
Some more details:
Improving data. The focus here was on the Northern Ireland Assembly, for which Wikidata now has full membership history back to the foundation of the Assembly, and on adding academic degrees of cabinet ministers. Starting from an excellent spreadsheet of the undergraduate universities and subjects of UK politicians and ministers (going back to John Major’s cabinets), we tried to upload that data on the relevant items, adding the qualifier “academic major” (P812) to the property “educated at” (P69). In this case, the key problem we found was that we weren’t sure how to model when people did joint subjects, like “Maths and Politics”, convincing us to concentrate on the more obvious subjects first.
Answering some unusual and/or intriguing questions. Inspired by a prior finding that there are more FTSE 100 CEOs named John than there are female ones, and that John is historically the most common name of UK parliamentarians, we thought we’d find out when exactly the John-to-female balance was toppled amongst the UK’s MPs (hint: not until 1992).
Going back further in history, we queried the first time each given name was recorded in Parliament, this was inspired by a recent news article about an MP who claimed he was the first “Darren” in the Commons.
Some ideas were also born that we weren’t able to see through, for various reasons. For example, could we discover which, if any, MPs are descended from people listed in the UCL’s ‘Legacy of British Slave-owners’ database? An interesting question, but at the moment, the answer is ‘no’, partly because child-parent relationships are currently inconsistently modelled in Wikidata, and partly because of the nature of Wikidata and ancestry: if there is someone who doesn’t exist in Wikidata (e.g. Grandad Bob, the painter) in the family chain, Wikidata can’t bridge the gap between a present day MP and the slave owner who might be their ancestor.
This is just the beginning
Work, of course, is still ongoing: all pre-1997 UK data is still to be inserted or improved on Wikidata, and so much more is missing – family connections, academic degrees, links to other databases, and all sorts of “unusual stuff” that can be used for interesting queries.
This data is crucial if we want to be able to answer the really big questions which Wikidata should one day be capable of helping us explore, about what politicians do.
We can do that together!
We hope that events like this give people an easy way in to Wikidata and also show them what’s already possible to achieve with the data. Over the coming months, we are hoping to support more events of this type around the world. If you are interested in getting involved, here’s how:
- Want to improve your country’s data? Events like this can be a great way to help kickstart activities and find other people who share your goals. We are happy to help out and support people in other countries to do so.
- Are you already organising or planning to organise a similar workshop around Wikidata? Make sure it is listed on the Wikidata Event page!
- Do you want to attend future workshops? Follow us on Twitter to stay updated about events that we are running, and ones that other people are too!
We’re also always looking for feedback and suggestions on workshop and event formats that might also work. Have you already run similar workshops? Let us know your impressions and suggestions on firstname.lastname@example.org!
Feature image credits: Mark Longair
This post is by Tony Bowden and Lucy Chambers from the EveryPolitician team. Today we officially launch our collaboration with Wikidata – here’s what to expect…
The story so far
You might have been following the progress: since 2015, through our project EveryPolitician, we’ve been gathering data for every national legislature in the world, from thousands of sources, and sharing it.
Now, two years on, we’ve started to see some great results. For example:
- It’s much easier to build simple Parliament Tracking sites (as mySociety partner organisations have in Zimbabwe and Nigeria). Those running the sites can work on providing information and context to hold politicians accountable — and don’t have to worry about wrangling data and software.
- Tools that allow citizens to write to their representatives (like Majlis Nameh in Iran, based on the WriteInPublic software, or Oxfam’s UK and Australia Campaigning Tool) can now be deployed in days rather than weeks, allowing groups to focus on local customisation.
- Vote-tracking sites like TheyVoteForYou in Australia can be more easily adapted to other countries — such as Ukraine, again without worrying about the major task of sourcing the politician data.
- Projects that highlight politicians’ activity can be augmented to show extra information against those politicians — so for example, Politwoops can show party affiliation next to politicians’ tweets.
- Investigative journalists can cross-match our lists of politicians to other sources (e.g. investigating shell companies for a recent Private Eye article or with the Panama Papers).
- mySociety’s collaboration with Facebook makes it easy for people to connect with their newly elected representatives.
All of this is possible because of two main tenets of EveryPolitician: having gathered the data, we structure it consistently, and we share it freely.
Not just current data
We’ve also discovered that there is a huge value for everyone in retaining historic information, too:
- Old data tends to disappear from many official sites, for example, when a new government comes into power and / or when parliaments decide to remodel their websites.
- Sometimes even the official sources no longer exist: for example, in Burkina Faso the Parliament building was burned down by protesters in 2014 and many crucial documents were lost.
- Research projects and sites run by civil society organisations sometimes run out of funding and have to shut down — meaning that the data people were relying on can vanish overnight.
The future: going one step further
So that’s all great, but we think EveryPolitician can do still more to help the worldwide community of Civic Tech coders and activists.
In particular, we want to go beyond data showing who the politicians are, and also provide information on what they do. That’s because we can see real value in the ability to answer questions like:
- When we look at politicians who vote on issues such as gay marriage, smoking bans, tax on sugary drinks, etc, anywhere in the world — are there any broad correlations like age or gender?
- As countries elect more women MPs, do those women also gain equivalent representation in committees? Does this affect attendance and participation rates?
- Are there standard political career paths that can be observed anywhere in the world? So for example, do certain Cabinet positions limit future progression; and if so, would it be feasible to spot politicians who are on the way up, or on their way out?
- Do politicians who move from the lower to upper house act, and vote, differently?
- Do politicians change their voting activity after certain types of intervention, for example after receiving funding from oil companies?
… and there are undoubtedly millions more questions, each one just as interesting and with answers that could enrich our understanding of the world. We are aiming for a future in which each should be answerable within minutes, rather than form the basis of a multi-year post-graduate research project.
How we plan to get there
You may remember our recent proposal to integrate more deeply with Wikidata. We’re delighted to say that our proposal was accepted, and that makes EveryPolitician’s path very clear.
Wikipedia is fast becoming one of the best sources of political information in many countries: it’s often updated more quickly than major news outlets or official parliamentary sites are.
Which is great, but there’s an issue when it comes to using that information for projects like the ones we’ve mentioned above: Wikipedia contains largely unstructured information (that is to say, information that comes in a wide variety of different formats) — as you’d expect from any project with multiple contributors and, often, a free-text input.
There are also a lot of differences between Wikipedias in different countries. Some countries’ data (particularly countries where the Wikipedia community is larger) gets updated very quickly. Smaller language Wikipedias can’t rely on such a large pool of editors, and it tends to be longer before they are updated.
Additionally, and as you might expect, Wikipedia content will appear fastest in the countries most directly affected by the change being documented, so for example when there are elections in Estonia, the Estonian Wikipedia may show the results almost immediately, but it can take a while for those changes to trickle into all languages.
But of course, nothing’s ever quite that simple. While Wikipedia has a wealth of unstructured political information, on Wikidata there’s still an awful lot of data missing. You may recall that we recently ran a drive to ensure that every country at least had its head of government entered, but that’s just the beginning: in order to answer the kinds of questions we mention above, we really need to ensure there’s consistently-structured information for all legislators, all elections… and much more.
What you’ll see in the coming months
While we plan for EveryPolitician to retain its own identity and keep its own front end, we’re excited to say that over the next few months, we’ll be teaming up with Wikidata communities across the world.
Our first objective is see if we can bring Wikidata up to the same level as EveryPolitician for as many countries as possible.
And once we hit that target, we plan to go much further. The beauty of Wikidata is that you can add pretty much any information you want to add to politicians (indeed, to anything!), so local communities can decide for themselves which information is most pertinent and make sure that it’s included.
To help us make all of this happen, we’re expanding. If you’ve read this far and you’re still finding the project interesting, the chances are you’d be a great addition to the team! We’re looking for a new community co-ordinator, working from anywhere in the world (compatible timezones allowing): you can see all the details here.
How you can get involved
There are plenty of ways that you can help with this drive to improve the structured information on Wikidata. Here are the obvious ones:
- There are active missions always ongoing on the Wikiproject Heads of State and Government. There’s one there now – check them out!
- Would you be interested in organising a push to Improve Wikidata for your country? Get in touch on email@example.com and we’ll do what we can to help.
- If this all sounds interesting, but you’re not sure where to start, or you’re unfamiliar with Wikidata – drop us an email and tell us what you’re interested in. We’re more than happy to help you get started.
Last you heard from from the EveryPolitician team we were talking about integrating EveryPolitician more deeply with Wikidata.
Here’s a quick update on what to expect in the coming months and a note on how you can get involved – we are going to need a lot of help!
Before we can make headway on a deeper integration, there are some pretty foundational pieces of the puzzle we need to put in place.
Some key data is missing, and you may be surprised to discover that it is pretty basic stuff. Take a look at this for example:
This is from a report we have generated to highlight gaps in the links between items. Notice how not even all of the offices have been filled in on the country page, let alone who occupies those offices?
This is the type of foundational data we are hoping to get in place over the next little while, and you can help…
How you can get involved
Over the coming weeks we’re going to be conducting a few experiments in public and trying to get data into Wikidata. We’ll need as much help as we can get!
The experiments will focus on using Wikidata to attempt to answer some questions we find interesting and see how we can expose gaps and inconsistencies in the data. In doing this, we’ll be pointing to specific reports we have generated and asking you to help us fill in the gaps.
Our first question will be:
“What is the gender breakdown of heads of government across the world?”
We’ll be blogging about the investigation over on Medium. The first post is already there:
Keen to dive straight in? Help us fill in the blanks you see in this report! If you are familiar with Wikidata, you will probably be able to get started straight away, but there are also tips and pointers in the Medium post if you want a bit more guidance.
You can get notified of the challenges to complete the data as soon as they are published by following the EveryPolitician bot on Twitter.
Want even more details on our plans?
We should also mention that we’ve updated our proposal to the Wikimedia Foundation to make a couple of things clearer about the problem we are trying to solve and our proposed solution (the proposal is still being reviewed). If you are interested in the nitty-gritty, that’s the best place to get the full overview of what we are planning.
Our EveryPolitician project makes data on the world’s politicians available in a useful, consistent format for anyone to use. If you’ve been following our progress, you’ll know we’ve already collated a lot of data (over 72,000 politicians from 233 countries). The work on adding to the depth and breadth of that data is ongoing, but EveryPolitician data is already being used to do interesting things.
Previously we looked at Politwoops as an example of EveryPolitician data being used to augment existing data.
In that case, the useful data for Politwoops was the politicians’ party affiliation. But our team (a handful of humans and one very busy bot) collects richer data than just that. EveryPolitician data includes contact information for politicians.
At mySociety, we know how powerful this particular kind of data can be. For example, our WriteToThem site makes it easy for UK citizens to contact their representatives (WriteToThem grew out of the earlier online service FaxYourMP, and uses the now more common technology of email).
Of course, there’s nothing especially radical about collecting email addresses of politicians… or phone numbers, Twitter handles, or Facebook pages. Indeed, many individuals and groups do just that. But an important difference with EveryPolitcian is that we’re not just collecting data (which happens to include those things, as well as a host of others) but also making it available so it’s easy to use. We do that by putting it out in consistent, useful formats.
For many projects, downloading a CSV of current politicians from EveryPolitician will be enough. That can be opened as a spreadsheet, and if one of those columns is called
Opening a spreadsheet is just one way of accessing the data. Our own use of EveryPolitician data to power the “Write in Public” MajlisNameh site for ASL19 (see this blog post for more about that) demonstrates a more programmatic approach.
But the whole point of making data available like this isn’t so that we can use it. It’s for other people, other groups. Anyone can build more nuanced or complex services with this data too.
For example, the people at Represent.me have built a sophisticated platform for gathering opinions and votes that can be shared with politicians and constituency MPs. It’s a system of information-gathering that has a network of citizens at one end feeding into their political representatives at the other. They use EveryPolitician’s data to populate their system with information about those representatives, including contact details, for each country they operate in.
And, because we make sure our data is consistently formatted, it’s a good general solution. As they cover more areas, they can expect the code they’ve written to ingest the EveryPolitician data in the countries they’re already operating in to also work as they expand into others.
If you’re running a project that needs such data, you could invest time and effort finding and collecting it all yourself. But it’s almost inevitable that you’d be using the same public sources that we are anyway — after all, we try to identify and use all the sources we can, merging them together into one, collated whole — so really it makes sense to simply take the data from EveryPolitician. Remember, too, that once our bot has been told about a source, it checks it daily for changes and updates too. So instead replicating the effort we’re already doing to gather the same data you need, you’re free to focus on developing the way your project uses that data… while we hunker on down and get on with collecting it.
Inevitably, as with all software projects, there’s always lots more to do, but already the value of providing useful data — and especially contact information — in a consistent format is clear.
Image: Telegraph Chambers (Montreal) CC BY-NC-ND 2.0 by Andre Vandal
Over the last two years, we’ve gathered data on the top-level politicians of almost every country in the world, and made it accessible to developers everywhere through our project EveryPolitician.
Now we’d like to take a step that we believe will benefit more people, and further extend the usefulness of this extensive dataset. We’re proposing to integrate more deeply with Wikidata, to fill the gaps in their coverage and provide consistent, linked data to their global community.
Wikidata is the central storage for the structured data each of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others. Wikidata also provides support to many other sites and services beyond just Wikimedia projects so the combination of EveryPolitician’s data with the reach of Wikidata’s community is pretty compelling.
So in many places, the aims of the EveryPolitician and Wikidata projects are already aligned. We already synchronise EveryPolitician data with the good quality data available in Wikidata where we find it, and we feed back our own additions. As our datasets improve, it seems prudent to combine efforts, and resources, in one place.
If you play an active part within the Wikidata community, or are someone who would benefit from this initiative, we’d very much appreciate your support. Please do add your endorsements or thoughts at the foot of that page if you’d like to see the project go ahead.
International Women’s Day seems like a good time to check in on our project Gender Balance, the crowdsourcing website that invites users to help gather gender data on the world’s politicians.
As you may recall, our aim was not simply to present top-level numbers: data already existed that allows us to, say, understand which legislatures have the most even-handed representation, genderwise.
No, Gender Balance seeks to go more in-depth: by attaching gender data to individual politicians, and making that data available via structured datasets, we hope to allow for more subtle comparisons to be made.
For example, researchers may like to test theories such as, ‘do women vote differently from men?’, or ‘do women politicians make different laws around childcare?’ — or a whole host of other questions, all of which can only be answered when gender data relates to specific public figures, or when it is viewed in combination with other data.
The data that is collected when you play Gender Balance goes, with data from other sources, into EveryPolitician, our project that seeks to provide structured, downloadable, open information across all the world’s legislatures.
Not right away, mind you. To ensure that the data really is accurate, we make sure that each politician on Gender Balance is presented to at least five different players, all of whom give the same answer, before we consider it verified.
EveryPolitician currently contains data for about 73,000 politicians in total. In some cases, that data came to us along with a trusted gender field, so we don’t need to run that through the Gender Balance mill, but the majority of parliamentary sites don’t provide this data.
We can sometimes obtain that information from other sources, but Gender Balance has been invaluable in filling in lots of the gaps. Thanks to our players, it has already provided us with gender information for over 30,000 politicians (and in some cases, pointed out discrepancies in the data we obtained from elsewhere).
There’s still plenty to go, though, if you’d like to help; and, as elections happen around the world, Gender Balance will continue to refresh with any politicians for whom we can’t find trusted gender data. As we speak, approximately 22,000 politicians still need sorting.
That might sound like quite a lot, but each politician need only take seconds — and every little helps. So, if you’d like to help contribute a little more gender data, just step this way.
Image: India’s Prime Minister Narendra Modi at the valedictory session of the National Conference of Women Legislators in Parliament House CC BY-SA 2.0, via Wikimedia Commons
Politwoops tracks politicians’ tweets, and reports the ones that are deleted.
Often those tweets are deleted because of a typo: everyone makes simple mistakes with the buttons on their devices, and politicians are no less human than the rest of us.
But Politwoops’s targets are public servants who use Twitter to communicate with that public. And sometimes the contents of the tweets they delete are not simply the result of bad typing. Those tweets can be especially interesting to people whom those politicians are representing: sometimes they may be evidence of a usually-suppressed prejudice, or an attempt to remove evidence of a previously held opinion that is no longer convenient.
In effect, Politwoops is a public archive of direct quotes that would otherwise be lost.
And also… EveryPolitician
Our EveryPolitician project is an ever-growing collection of data on every politician in the world (we’re not there yet, but we’re over 230 countries and 72,700 politicians in, and counting).
Like Politwoops, our data includes politicians’ Twitter handles. But also a lot more besides.
We make that data useful by putting it out in consistent, simple formats — the simplest of which is a comma-separated value (CSV) file for each term of a legislature. In practice, that means if you want a spreadsheet of the current politicians in your country’s parliament, then EveryPolitician is probably the place for you.
Put them together…
Now, Politwoops predates EveryPolitician by several years, and they’ve being doing their thing without needing our data just fine. In fact, Politwoops has been happily politwooping since 2010 (Politwoops is a project of the Open State Foundation, based in the Netherlands).
Behind the scenes, it works pretty much the way you’d expect: with a list of politicians’ Twitter handles for each country where it’s running.
But… who doesn’t want to add something extra for free? Our data also includes Twitter handles (mostly but not entirely from the same public sources Politwoops were using). So that meant they could take our CSVs and match each line—all that extra data!—with the Twitter handle.
Better, for free
So last year, they augmented their data with ours for one very simple win: they get to know party affiliation for the politician associated with each of those twitter handles. Well, actually they get to know lots of other things besides party — gender, date of birth, or… well, all our other data, if they wanted it. But just party? That’s also fine.
This all means that Politwoops now shows the party of each tweet’s deleter, just because they merged our CSV with theirs. Lovely!
Although party affiliation was the detail Politwoops went for, it turns out the other data from EveryPolitician was a little too tempting for them to ignore… So recently they’ve been doing some playful analysis on their statistics using the gender breakdown that EveryPolitician data makes possible too. You can see more on the Politwoops website.
You can too
To be clear: Politwoops did this, not us. We’re committed to doing the groundwork of finding, collecting and collating the data, and making it available (and, additionally, endlessly checking for updates… if you’re interested in how this all works, you can read our bot’s own blog). We do this so people who want to get on with using the data can do just that. As did, in this case, Breyten and his team at Politwoops.
EveryPolitician’s data is available as plain CSVs for this kind of thing, but we also provide a richer JSON version too if that’s more useful to you. All the files are downloadable from the website. If you’re a coder who wants to dive in, there are libraries to make it even easier for you (the EveryPolitician team works in Ruby, so we wrote the everypolitician gem, but there are also ports to Python and PHP).
For more information see the docs.everypolitian.org.
The EveryPolitician bot wrote its own version of this blog post, which goes into a little more detail of the process.
Even official records aren’t as safe as you might think they are. The archive of a country’s political history might be wiped out in a single conflagration.
Take the example of Burkina Faso, a beautiful West African country that is, sadly, perhaps best known to the rest of the world for its troubled political past.
The uprising in Burkina Faso in 2014 led to a fire in the National Assembly building and archives office. Nearly 90% of the documents were lost. Now the National Assembly is working to reconstruct the list of its parliament’s members before 1992.
This means that the data EveryPolitician has on Burkina Faso has nothing from terms before 1992. We’ve got some data for six of the seven most recent terms from the National Assembly so far, of which five are live on the site. Even though that data is not very rich (there’s little more than names in many cases; and the 6th term was transitional so data on that one’s membership might remain elusive) it’s a beginning.
We know from experience that data-gathering often proceeds piecemeal, and names are always a good place to start.
As Tinto finds new data, whether that’s more information about the politicians already collected or membership lists of the missing terms before 1992, we’ll be adding that to EveryPolitician too.
A vast collection
When people ask what EveryPolitician is, we often say, ‘The clue’s in the name’. EveryPolitician aims to provide data about, well … every politician. In the world.
(We’ve limited our scope — for the time being — to politicians in national-level legislatures).
The project is growing. Since our launch last year, we’ve got data for legislatures in 233 countries. The amount of data we’ve collected currently comprises well over three million items. The number of politicians in our datafiles is now in excess of 70,000.
Seventy thousand is an awful lot of politicians.
In fact, if you think that might be more politicians than the world needs right now, you’re right: as the Burkina Faso example shows, EveryPolitician collects historic data too.
So as well as the people serving in today’s parliaments, our data includes increasing numbers of those from the past. (Obviously, if you have such data for your country’s legislature, we’d love to hear from you!)
More than just today’s data
The Burkina Faso fire is an illustration of the value of collecting and preserving this historic data.
Of course, we’re fully aware of the usefulness of current data, because we believe that by providing it we can seed many other projects — including, but in no way limited to, parliamentary monitoring sites around the world (sites like our own TheyWorkForYou in the UK, or Mzalendo in Kenya, for example).
Nonetheless, we never intended to limit ourselves to the present. By sharing and collating historic records too, we hope to enable researchers, journalists, historians and who-knows-who-else to investigate, model, or reveal connections and trends over time that we haven’t even begun to imagine. We know this data has value; we look forward to discovering just how much value.
But it turns out we’re providing a simpler potential benefit too. EveryPolitician’s core datafiles are an excellent distributed archive.
What Burkina Faso’s misfortune goes to show is that, as historians know only too well, data sources can be surprisingly fragile.
In this case the specific situation involves paper records being destroyed by fire. That is a simple analogue warning to the digital world. Websites and their underlying databases are considerably more volatile than the most flammable of paper archives.
Database-backed sites are often poor catalogues of their pasts. Links, servers and domain registrations all expire. Access to data may be revoked, firewalls can appear.
Digital data doesn’t fade; instead it is so transient that it can simply disappear.
Of course, we cannot ourselves guarantee that our servers will be here forever (we’re not planning on going anywhere, but projects like this have to be realistic about the longer view).
There is an intriguing consequence of us using GitHub as our datastore. The fact is, the EveryPolitician data you can download isn’t coming off our servers at all. Instead, we benefit from GitHub’s industrial-scale infrastructure, as well as the distributed nature of the version control system, git, on which it is based. By its nature, every time someone clones the repository (which is easy to do), they’re securing for themselves a complete copy of all the data.
But the point is not necessarily about data persisting far into the next millennium — that’s a bit presumptuous even for us, frankly — so much as its robustness over the shorter cycles of world events. So, should any nation’s data become inaccessible (who knows? for the length of an interregnum or civil war, a natural disaster, or maybe just a work crew accidentally cutting through the wrong cable outside parliament), we want to know the core data will remain publicly available until it’s back.
Naturally there are other aspects to the EveryPolitician project which are more — as modern language would have it — compelling than collecting old data about old politicians. But the usefulness of the EveryPolitician project as a persistent archive of historical data is one that we have not overlooked.
mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.
Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.
First, some background on why we need our little bot.
Because there’s so much to do
One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:
- Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.
- Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.
- Politicians retire
Or die, or change their names or titles, or switch party or faction.
- New parties emerge
Or the existing ones change their names, or form coalitions.
- Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.
- Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.
- Someone in Russia updates the wikipedia page about a politician in Japan
Wikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)
- New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.
- New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.
So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.
We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.
The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.
So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.
We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.
The bot is already a few days into blogging — its first post was “I am a busy bot”, but you can see all the others on its own Medium page. You can also follow it on Twitter as @everypolitician. Of course, its true home, where all the real work is done, is the everypoliticianbot account on GitHub.
Images: CC-BY-SA from the EveryPolitician bot’s very own scrapbook.
- Legislatures change en masse, because of elections, etc.