Excuse us while we just finish hanging this bunting…
Yes, wave the flags and toot those vuvuzelas: it’s National Democracy Week, a new initiative to celebrate the democratic process and encourage democratic participation.
And thanks to some extra-curricular work by one of the mySociety team, we’re now able to celebrate it in a quite exceptional way. Longstanding developer Matthew has used his own free time to import historic House of Commons debates from 1919-1935 into our parliamentary site TheyWorkForYou. With this work, he’s extended the site’s value as an easy-access archive of parliamentary activity even further.
You can check it out now by visiting TheyWorkForYou, searching for any word or phrase, and then sorting the search results by ‘oldest’. Or, pick any MP active during 1919-1935 and search for them to see every speech they made in Parliament.
Please let us know if you find anything of interest! For developers who use TheyWorkForYou data to power their own sites and apps, the extended content will also be available via TheyWorkForYou’s API.
“No one sex can govern alone” – Nancy Astor
This is the first National Democracy Week, and it has taken, as its theme, the anniversary of women’s suffrage: as you’re sure to have heard by now, 2018 is the centenary of (some) women getting the vote* in the UK.
We wanted to celebrate by highlighting some of the big milestones of women’s participation in Parliament, but there was just one problem. TheyWorkForYou only contained House of Commons debates as far back as the 1930s — while, for example, the maiden speech of Nancy Astor, the first woman to speak in Parliament, was in 1920.
So it’s a big deal that Matthew’s imported this early data into TheyWorkForYou, and we’re all the more grateful because he did so on his own time. It’s something we’ve wanted to do, but not had the resource for. You can now browse, search or link to Commons debates right back to 1919, and find not just women’s contributions, but a whole wealth of historic parliamentary content. Result!
What you can enjoy this week
We’re going to take this opportunity to highlight, through a week-long series of posts:
- Today: Milestones in women’s parliamentary participation A rundown of when and how women became integrated into the UK Parliament. You can see this, our first post of the week, right now.
- Tomorrow, our researcher Alex will be highlighting some of the ways people have used our data and APIs to explore issues of gender and representation and describe some of our future plans in this area. This also gives us the opportunity to point out where you can access all our lovely, juicy data, should you want to do something similar yourself.
- On Wednesday we’ll delve back into history, this time looking at the changes in law which have had bearing on women’s lives, with more links to richer background detail on TheyWorkForYou.
- On Thursday Alex is back, exploring what we can learn from EveryPolitician data about representation of women in democracies around the world.
- Friday will give us a chance to show how you can use TheyWorkForYou to research when topics were first mentioned in Parliament, and how that can give a snapshot of the zeitgeist.
- Finally, as a weekend bonus, we’ll be blogging on the various organisations which support women within our own sphere of Civic Tech.
We’ll add the links in for each day’s content as it goes live.
Since our sites TheyWorkForYou and WriteToThem in the UK, our activities with the Democratic Commons, and the support we give to partners in other countries are all, at heart, aiming to make democratic participation easier, we are, of course, all over this event. We hope you’ll enjoy the week!
*We can have another celebration in 2028 for the remaining women.
Earlier this month, Mark laid out the concept of a Democratic Commons for the Civic Tech community: shared code, data and resources where anyone can contribute, and anyone can benefit.
He also talked about exploring new models for funding the kind of work that we do in our Democracy practice at mySociety.
For many years, our Better Cities work has been proof of concept for one such model: we provide data and software as a service (FixMyStreet, MapIt, Mapumental) to paid clients, the revenue from which then funds our charitable projects. Could a similar system work to sustain our Democracy practice?
That’s the hope, and with Facebook who we first worked with during the UK General Election in June, providing the data that helped people see and connect with their elected representatives, we’ve already seen it in action.
This kind of project is positive on multiple levels: it brings us an income, it brings the benefits of democratic engagement to a wider audience than we could reach on our own, and it contributes data back into EveryPolitician and Wikidata, that everyone can use.
The UK election was only the first for which we did this work: we’ve gone on to provide the same service for the French elections and more recently for the rather more eventful Kenyan ones — currently on hold as we await the re-run of the Presidential election next month. And now we’re doing the same for the German elections, where candidate data is being shared this week.
As we’re learning, this is definitely not one-size-fits-all work, and each country has brought its own interesting challenges. We’re learning as we go along — for example, one significant (and perhaps obvious) factor is how much easier it is to work with partners in-country who have a better understanding of the sometimes complex political system and candidates than we can ever hope to pick up. Much as we might enjoy the process, there’s little point in our spending days knee-deep in research, when those who live in-country can find lists of candidates far more quickly, and explain individual levels of government and electoral processes far better.
Then, electoral boundaries are not always easy to find. We’ve used OpenStreetMap where possible, but that still leaves some gaps, especially at the more granular levels where the data is mainly owned and licensed by the government. It’s been an exercise in finding different sources and putting them all together to create boundary data to the level required.
Indeed, that seems to be a general pattern, also replicated across candidate data: at the national level, it’s easy to find and in the public domain. The deeper you go, the less those two descriptors hold true. It was also at this point that we realised how much, here in the UK, we take for granted things like the fact that the spelling of representatives’ names is usually consistent across a variety of sources — not always a given elsewhere, and currently something that only a human can resolve!
What makes all the challenges more worthwhile, though, is that we know it’s not just a one-off push that only benefits a single project. Nor is the data going straight into Facebook, never to be seen again.
Much of what we’re collecting, from consistent name data to deep-level boundaries data, is to be made available to all under open use licenses. For example, where possible we can submit the boundaries back to OpenStreetMap, helping to improve it at a local granular level across whole countries.
The politician data, meanwhile, will go into Wikidata and EveryPolitician so that anyone can use it for their own apps, websites, or research.
There are also important considerations about how this type of data will be used and where and when it is released in the electoral process; finding commercial models for our Democracy work is arguably a more delicate exercise than on some of our other projects. But hopefully it’s now clear exactly how a project like this can both sustain us as a charity, and have wider benefits for everyone — the holy grail for an organisation like us.
At the moment it’s unclear how many such opportunities exist or if this is a one-off. We’re certainly looking for more avenues to extend the scope of this work and keen to hear more ideas on this approach.
Your contributions help us keep projects like EveryPolitician up and running, for the benefit of all.Donate now
Millions of people reached for their phone on June 9, and checked Facebook for the result of the UK General Election.
Now, you may or may not be one of those people yourself, but there’s no disputing that many of us turn to social media as our primary source for big news. Through the night, Facebook was a place where we could express feelings about the results as they came in, share news stories and ask questions: it gives us a rounded view of an event like an election, quite unlike any you’ll receive from traditional media.
And the morning after, those logging in to Facebook may have seen something like this — an invitation to follow your newly-elected or re-elected MP and other elected representatives, from local councillors to MEPs:
We’re glad to say that mySociety has been working alongside Facebook to help make this happen.
Reaching people where they are
mySociety has a mission to make democracy more accessible for everyone, and via our websites TheyWorkForYou and WriteToThem, we serve and inform more than 400,000 UK citizens per month.
That figure has, as we’d expect, spiked in the last few weeks as people rush to check their MPs’ track records, all the better to cast an informed vote; but all the same, we’re well aware that 400,000 users is only a small proportion of the country’s electorate.
What’s more, our research has consistently shown that our services don’t adequately reach the people that need them most: our typical user is male, reasonably affluent, well-educated, older and white — I mean, we’re glad to be there for everyone, but generally speaking this is a demographic that can inform itself quite readily without any extra help.
That’s not a problem Facebook has, though, with their 32 million UK users. 75% of them log in on a daily basis, and almost half are under the age of 30*.
That’s why we were so keen to join forces with the Facebook Civic Engagement team, to help this large online audience see who their representatives are today.
Facebook for engagement
You may not have been aware that Facebook has a dedicated political engagement team — unless you came to TICTeC this year, of course, in which case you’d have seen a walkthrough of the extensive research that’s gone into their election offerings globally — but if you use Facebook at all, and if you’re in a country that has recently had an election, you’ve probably seen some of their work.
Over the last few weeks in the UK, people on Facebook were alerted to each stage of the electoral process. They were invited to check who their candidates were and what they stood for; offered a reminder to vote and provided information on where and how to do so; and finally, encouraged to share the fact that they had voted, tapping into the proven peer encouragement effect.
mySociety behind the scenes
Thanks to our experience running TheyWorkForYou and WriteToThem, plus the support we receive from Commercial Evaluations and their Locator Online service and our involvement with Democracy Club’s WhoCanIVoteFor.co.uk, we have access to accurate and up-to-date data on candidates and representatives at every level, from local councillors up to MEPs, and including MPs — all linked to the relevant constituencies.
In all, this totaled around 23,000 people. What we needed to discover was how many of them were on Facebook — and could we accurately link our records to their Facebook pages?
Working together with Facebook, we built an admin tool that displayed likely pages to our team, on the basis of names, locations and the really giveaway information, such as ‘Councillor’, ‘MP’ or the constituency name in the page title. Some representatives didn’t have individual pages, but ran a party page; those counted too (and of course, a fair proportion of representatives have no Facebook presence at all).
While our tool filtered the results reasonably well, it was still necessary to make a manual check of every record to ensure that we were linking to the correct representative, and not, say, someone who happened to have the same name and live in the same town. We needed to link, of course, only to ‘official’ pages; not representatives’ personal pages full of all those things we use Facebook for on a day-to-day basis. Those holiday snaps, Candy Crush results and cat memes won’t help constituents much: what we were looking for was the kind of page where constituents could message their reps, find out about surgery times, and get the latest news from their constituency.
Now of course, until the results came in, no-one knew precisely which candidates would be MPs! So a small crack team of mySociety people worked through Thursday night to do the final matching. It was a very long night, but we hope that the result will be an awful lot more people following their representatives, and so quite effortlessly becoming more politically engaged, thanks to a platform which they already visit on a regular basis.
There’s a new piece of data on MapIt, and it wasn’t added by us. It’s tiny but useful, and it’s slightly esoteric, so bear with us and we’ll explain why it’s worth your attention.
Local Authority codes come from the government’s set of canonical registers. They may not look much, but they’re part of a drive to bring consistency across a wide range of data sets. That’s important, and we’ll try to explain why.
One name can refer to more than one thing
If you try to buy a train ticket to Gillingham in the UK, and you are lucky enough to be served by a conscientious member of staff, they will check whether you are going to the Gillingham in Kent (GIL), or the one in Dorset (GLM).
The names of the two towns might be identical, but their three-letter station codes differ, and quite right too — how, otherwise, would the railway systems be able to charge the right fare? And more importantly, how many people would set off confidently to their destination, but end up in the wrong county?
I mention this purely to illustrate the importance of authoritative, consistent data, the principle that is currently driving a government-wide initiative to ensure that there’s a single canonical code for prisons, schools, companies, and all kinds of other categories of places and organisations.
Of particular interest to us at mySociety? Local authorities. That’s because several of our services, from FixMyStreet to WriteToThem, rely on MapIt to connect the user to the correct council, based on their geographical position.
One thing can have more than one name
I live within the boundaries of Brighton and Hove City Council.
That’s its official name, but when talking or writing about my local authority, I’m much more likely to call it ‘Brighton’, ‘Brighton Council’, or at a push, ‘Brighton & Hove Council’. All of which is fine within everyday conversation, but which is an approach which could cause mayhem for the kind of data that digital systems need (“machine readable” data, which is consistent, structured and in a format which can be ‘understood’ by computer programs).
Registers of Open Data
The two examples above go some way towards explaining why the Department for Local Government & Communities, with Government Digital Services (GDS), are in the process of creating absolute standards, not just for councils but for every outpost of their diverse and extensive set of responsibilities, from the Food Standards Agency to the Foreign & Commonwealth Office, the Land Registry and beyond.
Where possible, these registers are published and shared as Open Data that anyone can use. It’s all part of GDS’ push towards ‘government as a platform’, and in keeping with the work being done towards providing Open Data throughout the organisation. Where possible these registers are openly available, and can be used by anyone building apps, websites and systems.
And now we come to those Local Authority codes that you can find on MapIt.
Anyone can contribute to Open Source code
Like most mySociety codebases, MapIt is Open Source.
That means that not only can anyone pick up the code and use it for their own purposes, for free, but that they’re also welcome to submit changes or extensions to the existing code.
And that’s just how GDS’ Sym Roe submitted the addition of the register.
What it all means for you
If you’re a developer, the addition of these codes means that you can use MapIt in your app or web service, and be absolutely sure that it will integrate with any other dataset that’s using the same codes. So, no more guessing whether our ‘Plymouth’ is the same as the ‘Plymouth’ in your database; the three-letter code tells you that it is.
Plus, these register codes identify a local authority as an organisation, or a legal entity, as opposed to setting out the boundary, so that’s an extra layer of information which we are glad to be able to include.
I’m just a few weeks into my position of Research Associate at mySociety and one of the things I’m really enjoying is the really, really interesting datasets I get to play with.
Take FixMyStreet, the site that allows you to report street issues anywhere in the UK. Councils themselves will only hold data for the issues reported within their own boundaries, but FixMyStreet covers all local authorities, so we’ve ended up with probably the most comprehensive database in the country. We have 20,000 reports about dog poop alone.
Now if you’re me, what to do with all that data? Obviously, you’d want to do something with the dog poop data. But you’d try something a bit more worthy first: that way people won’t ask too many questions about your fascination there. Misdirection.
How does it compare?
So, starting with worthy uses for that massive pile of data, I’ve tried to see how the number of reports in an area compares against other statistics we know about the UK. Grouping reports into ONS-defined areas of around 1,500 people, we can match the number of reports within an area each year against other datasets.
To start with I’m just looking at English data (Scotland, Wales and Northern Ireland have slightly different sets of official statistics that can’t be combined) for the years 2011-2015. I used population density information, how many companies registered in the area, if there’s a railway station, OFCOM stats on broadband and mobile-internet speeds, and components from the indices of multiple deprivation (various measures of how ‘deprived’ an area is, such as poor health, poor education prospects, poor air quality, etc) to try and build a model that predicts how many reports an area will get.
The good news: statistically we can definitely say that some of those things have an effect! Some measures of deprivation make reports go up, others make it go down. Broadband and mobile access makes them go up! Population density and health deprivation makes them go down.
The bad news: my model only explains 10% of the actual reports we received, and most of this isn’t explained by the social factors above but aspects of the platform itself. Just telling the model that the platform has got more successful over time, which councils use FixMyStreet for Councils for their official reporting platform (and so gather more reports) and where our most active users are (who submit a disproportionate amount of the total reports) accounts for 7-8% of what the model explains.
What that means is that most reasons people are and aren’t making reports is unexplained by those factors. So for the moment this model is useful for building a theory, but is far from a comprehensive account of why people report problems.
Here’s my rough model for understanding what drives areas to submit a significantly higher number of reports to FixMyStreet:
- An area must have a problem
Measures of deprivation like the ‘wider barriers to housing deprivation’ metric (this includes indicators on overcrowding and homelessness) as well as crime are associated with an increase in the number of reports. The more problems there are, the more likely a report is — so deprivation indicators we’d imagine would go alongside other problems are a good proxy for this.
- A citizen must be willing or able to report the problem
Areas with worse levels of health deprivation and adult skills deprivation are correlated with lower levels of reports. These indicators might suggest citizens less able to engage with official structures, hence fewer reports in these areas.
People also need to be aware of a problem. The number of companies in an area, or the presence of a railway station both increase the number of reports. I use these as a proxy for foot-traffic – where more people might encounter a problem and report it.
Population density is correlated with decreased reports which might suggest a “someone else’s problem” effect – a slightly decreased willingness to report in built-up areas where you think someone else might well make a report.
- A citizen must be able to use the website
As an online platform, FixMyStreet requires people to have access to the website before they can make a report. The less friction in this experience makes it more likely a report will be made.
This is consistent with the fact that an increased number of slow and fast home broadband connections (and fast more than slow ones) increases reports. This is also consistent with the fact that increased 3G signal in premises is correlated with increased requests.
Reporting problems on mobile will sometimes be easier than turning on the computer, and we’d expect areas where people more habitually use mobiles for internet access to have a higher number of reports than broadband access alone would suggest. If it’s slightly easier, we’d expect slightly more – which is what this weak correlation suggests.
Not all variables my model includes are significant or fit neatly into this model. These are likely working as proxy indicators for currently unaccounted for, but related factors.
I struggle, for instance, to come up with a good theory why measures of education deprivation for young people are associated with an increase in reports. I looked to see if there was a connection between an area having a school and having more reports on the basis of foot-traffic and parents feeling protective over an area – but I didn’t find an effect for schools like I did for registered companies.
So at the moment, these results are a mix of “a-hah, that makes sense” and “hmm, that doesn’t”. But given that we started with a dataset of people reporting dog poop, that’s not a terrible ratio at this point. Expanding the analysis into Scotland and Wales, analysing larger areas, or focusing on specific categories of reports might produce models that explain a bit more about what’s going on when people report what’s going wrong.
I’ll let you know how that goes.
It’s been a while since we looked in on Collideoscope, our project for reporting and collating data on cycling collisions and near misses, developed in collaboration with ITP. But what better time than now, when days are short and accidents have unfortunately, as always at this time of year, taken a sharp upturn.
So, let’s have a catch-up, and a reminder that you should use the service. Of course, we hope you won’t experience any problems, but remember that Collideoscope is there if you do.
Previously on Collideoscope…
As you may recall, Collideoscope is a site for reporting cycling incidents, collisions and near misses. Because it’s built on the FixMyStreet platform, it offers all the same functionality for the user: it’ll help you to pinpoint the precise location of the incident you’re reporting, and then send the details off to the relevant authorities.
When cyclists make a report, they’re contributing to an open dataset that improves the quality of the evidence base on cycling incidents.
While FixMyStreet sends reports off to councils, Collideoscope sends reports to local authorities’ highways departments, with the aim of highlighting potential accident blackspots.
The data, after going through an anonymisation process, is also shared with campaign groups.
Finally, the anonymised data is also available for anyone to download via Socrata, to be used for any purpose. One potential project we’d love to see, for example, would be route-planning applications to help cyclists avoid going through areas with a high density of incidents.
The data is also available to researchers, town planners and the police: when cyclists make a report, they’re contributing to an open dataset that improves the quality of the evidence base on cycling incidents.
So, that’s the model. Let’s have a look at how well it has stood up.
Collideoscope launched in October 2014 and users have thus far made a total of 1,195 reports.
In order to provide a more complete dataset with the clearest possible indicators of accident hotspots, we also imported STATS19 data from the annually-updated open police database of accidents, meaning that Collideoscope now contains data points on over 20,000 incidents across the UK.
Here’s what we’ve learned
Steering a project from concept to reality is always a learning process. Here are some of the key lessons that emerged:
- Collideoscope sends each report to authorities as it is submitted. It became clear that a bulk dataset would be easier for highways authorities to handle and to draw conclusions from, and this is now available.
- Originally, we’d believed that it would be useful if Collideoscope could forward reports to local police forces, so that they could be actioned where suitable. However, this proved impractical, because the Road Traffic Act states that collisions must be reported to a police officer in person. Collideoscope’s data would not be sufficient for police to take action on those cases which merited it.
- There was some concern that reports made via Collideoscope would replicate, rather than complement, the police force’s official STATS19 data. Happily, once enough reports had come into Collideoscope, a comparison was run and found that there is very little overlap between the two datasets.
While STATS19 data tends to cover serious incidents, it doesn’t hold much on the near miss or minor incidents that Collideoscope encourages users to also report — and which make up 90% of the Collideoscope database. One of the underlying beliefs behind Collideoscope has always been that near miss data can tell us a lot about accident prevention.
ITP have now stepped away from Collideoscope: we’re extremely grateful for their collaboration and support with the development and running of Collideoscope in its first couple of years. This move will mean that we can pursue funding from charitable grant foundations.
As you may recall from prior updates, the site was also supported by the Barts Bespoke campaign, a multi-pronged initiative to reduce accidents for cyclists. This support, and a further research grant from the Department for Transport, came to an end last month. As a result, we’ll no longer be asking people about injuries sustained when they file a Collideoscope report.
Collideoscope will keep on rolling: we’re open to potential partners and have plenty of ideas for further development, including the possibility of a public API, or incident-reporting forms that could be placed on any website.
If you’re from a local government, third sector or private company, and you’re interested in using Collideoscope data to enable better decision making on cycle safety, this’d be a great time to get in touch.
Census data: there’s lots of it. It contains fascinating insights.
But as with many huge datasets, those insights are not always easy to find at first glance — nor is it easy for the untrained observer to see which parts are relevant to their own lives.
Wazimap in South Africa takes the country’s census data and turns it into something the user can explore interactively. Originally conceived as a tool for journalists, it turned out to be so accessible that it’s used by a much wider range of the population, from school children to researchers. It’s a great example of how you can transform dry data into something meaningful online, and it’s all done using free and open source tools.
Our points-to-boundaries mapping software MapIt is part of that mix, putting the data in context and ensuring that visitors can browse the data relevant to specific provinces, municipalities or wards.
We asked Greg Kempe of Code for South Africa, to fill us in on a bit more.
What exactly is Wazimap?
Wazimap helps South Africans understand where they live, through the eyes of the data from our 2011 Census. It’s a research and exploration tool that describes who lives in South Africa, from a country level right down to a ward, including demographics such as age and gender, language and citizenship, level of education, access to basic services, household goods, employment and income.
It has helped people understand not just where they work and live, but also that data can be presented in a way that’s accessible and understandable.
Users can explore the profile of a province, city or ward and compare them side-by-side. They can focus on a particular dataset to view just that data for any place in the country, look for outliers and interesting patterns in the distribution of an indicator, or draw an indicator on a map.
Of course Wazimap can’t do everything, so you can also download data into Excel or Google Earth to run your own analysis.
Wazimap is built on the open source software that powers censusreporter.org, which was built under a Knight News Challenge grant, and is a collaboration between Media Monitoring Africa and Code for South Africa.
Due to demand from other groups, we’ve now made Wazimap a standalone project that anyone can re-use to build their own instance: details are here.
How did it all begin?
Media Monitoring Africa approached Code for South Africa to build a tool to help journalists get factual background data on anywhere in South Africa, to help encourage accurate and informed reporting.
Code for South Africa is a nonprofit that promotes informed decision-making for positive social change, so we were very excited about collaborating on the tool.
Could MapIt be useful for your project? Find out more here
How exactly does MapIt fit into the project?
Mapit powers all the shape boundaries in Wazimap. When we plot a province, municipality or ward boundary on a map in Wazimap, or provide a boundary in a Google Earth or GeoJSON download, MapIt is giving Wazimap that data.
We had originally built a home-grown solution, but when we met mySociety’s Tony Bowden at a Code Camp in Italy, we learned about MapIt. It turned out to offer better functionality.
What level of upkeep is involved?
Wazimap requires only intermittent maintenance. We had municipal elections in August 2016 which has meant a number of municipal boundaries have changed. We’re waiting on Statistics South Africa to provide us with the census data mapped to these new boundaries so that we can update it. Other than that, once the site is up and running it needs very little maintenance.
What’s the impact of Wazimap?
We know that Wazimap is used by a wide range of people, including journalists, high school geography teachers, political party researchers and academics.
Code for South Africa has been approached a number of times, by people asking if they might reuse the Wazimap platform in different contexts with different data. Most recently, youthexplorer.org.za used it to power an interactive web tool providing a range of information on young people, helping policy makers understand youth-critical issues in the Western Cape.
We also know that it’s been used as a research tool for books and numerous news articles.
The success of the South African Wazimap has driven the development of similar projects elsewhere in Africa which will be launching soon, though MapIt won’t be used for those because their geography requirements are simpler.
What does the future hold?
As we’re building out Wazimap for different datasets, we’re seeing a need for taking it beyond just census data. We’re making improvements to how Wazimap works with data to make this possible and make it simpler for others to build on it.
Each new site gives us ideas for improvements to the larger Wazimap product. The great thing is that these improvements roll out and benefit anyone who uses it across every install.
Thanks very much to Greg for talking us through the Wazimap project and its use of MapIt. It’s great to hear how MapIt is contributing to a tool that, in itself, aids so many other users and organisations.
Need to map boundaries? Find out more about MapIt here
Even official records aren’t as safe as you might think they are. The archive of a country’s political history might be wiped out in a single conflagration.
Take the example of Burkina Faso, a beautiful West African country that is, sadly, perhaps best known to the rest of the world for its troubled political past.
The uprising in Burkina Faso in 2014 led to a fire in the National Assembly building and archives office. Nearly 90% of the documents were lost. Now the National Assembly is working to reconstruct the list of its parliament’s members before 1992.
This means that the data EveryPolitician has on Burkina Faso has nothing from terms before 1992. We’ve got some data for six of the seven most recent terms from the National Assembly so far, of which five are live on the site. Even though that data is not very rich (there’s little more than names in many cases; and the 6th term was transitional so data on that one’s membership might remain elusive) it’s a beginning.
We know from experience that data-gathering often proceeds piecemeal, and names are always a good place to start.
As Tinto finds new data, whether that’s more information about the politicians already collected or membership lists of the missing terms before 1992, we’ll be adding that to EveryPolitician too.
A vast collection
When people ask what EveryPolitician is, we often say, ‘The clue’s in the name’. EveryPolitician aims to provide data about, well … every politician. In the world.
(We’ve limited our scope — for the time being — to politicians in national-level legislatures).
The project is growing. Since our launch last year, we’ve got data for legislatures in 233 countries. The amount of data we’ve collected currently comprises well over three million items. The number of politicians in our datafiles is now in excess of 70,000.
Seventy thousand is an awful lot of politicians.
In fact, if you think that might be more politicians than the world needs right now, you’re right: as the Burkina Faso example shows, EveryPolitician collects historic data too.
So as well as the people serving in today’s parliaments, our data includes increasing numbers of those from the past. (Obviously, if you have such data for your country’s legislature, we’d love to hear from you!)
More than just today’s data
The Burkina Faso fire is an illustration of the value of collecting and preserving this historic data.
Of course, we’re fully aware of the usefulness of current data, because we believe that by providing it we can seed many other projects — including, but in no way limited to, parliamentary monitoring sites around the world (sites like our own TheyWorkForYou in the UK, or Mzalendo in Kenya, for example).
Nonetheless, we never intended to limit ourselves to the present. By sharing and collating historic records too, we hope to enable researchers, journalists, historians and who-knows-who-else to investigate, model, or reveal connections and trends over time that we haven’t even begun to imagine. We know this data has value; we look forward to discovering just how much value.
But it turns out we’re providing a simpler potential benefit too. EveryPolitician’s core datafiles are an excellent distributed archive.
What Burkina Faso’s misfortune goes to show is that, as historians know only too well, data sources can be surprisingly fragile.
In this case the specific situation involves paper records being destroyed by fire. That is a simple analogue warning to the digital world. Websites and their underlying databases are considerably more volatile than the most flammable of paper archives.
Database-backed sites are often poor catalogues of their pasts. Links, servers and domain registrations all expire. Access to data may be revoked, firewalls can appear.
Digital data doesn’t fade; instead it is so transient that it can simply disappear.
Of course, we cannot ourselves guarantee that our servers will be here forever (we’re not planning on going anywhere, but projects like this have to be realistic about the longer view).
There is an intriguing consequence of us using GitHub as our datastore. The fact is, the EveryPolitician data you can download isn’t coming off our servers at all. Instead, we benefit from GitHub’s industrial-scale infrastructure, as well as the distributed nature of the version control system, git, on which it is based. By its nature, every time someone clones the repository (which is easy to do), they’re securing for themselves a complete copy of all the data.
But the point is not necessarily about data persisting far into the next millennium — that’s a bit presumptuous even for us, frankly — so much as its robustness over the shorter cycles of world events. So, should any nation’s data become inaccessible (who knows? for the length of an interregnum or civil war, a natural disaster, or maybe just a work crew accidentally cutting through the wrong cable outside parliament), we want to know the core data will remain publicly available until it’s back.
Naturally there are other aspects to the EveryPolitician project which are more — as modern language would have it — compelling than collecting old data about old politicians. But the usefulness of the EveryPolitician project as a persistent archive of historical data is one that we have not overlooked.
mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.
Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.
First, some background on why we need our little bot.
Because there’s so much to do
One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:
- Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.
- Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.
- Politicians retire
Or die, or change their names or titles, or switch party or faction.
- New parties emerge
Or the existing ones change their names, or form coalitions.
- Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.
- Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.
- Someone in Russia updates the wikipedia page about a politician in Japan
Wikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)
- New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.
- New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.
So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.
We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.
The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.
So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.
We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.
The bot is already a few days into blogging — its first post was “I am a busy bot”, but you can see all the others on its own Medium page. You can also follow it on Twitter as @everypolitician. Of course, its true home, where all the real work is done, is the everypoliticianbot account on GitHub.
Images: CC-BY-SA from the EveryPolitician bot’s very own scrapbook.
- Legislatures change en masse, because of elections, etc.
For verified, reliable information, it’s usually best to go to the official source — but here’s an exception.
Checking parliament.go.ke‘s list of MPs against Mzalendo’s, our developers discovered a large number of constituency mismatches. These, explained Jessica Musila from Mzalendo, came about because the official site has not reflected boundary changes made in 2013.
Even more significantly, the official parliament site currently only holds details of 173 of the National Assembly’s 349 MPs.
“The gaps in www.parliament.go.ke validate Mzalendo’s very existence,” said Jessica. We agree: it’s a great example of the sometimes unexpected needs filled by parliamentary monitoring websites.
And of course, through EveryPolitician, we’re working to make sure that every parliamentary monitoring website can access a good, reliable source of data.