Census data: there’s lots of it. It contains fascinating insights.
But as with many huge datasets, those insights are not always easy to find at first glance — nor is it easy for the untrained observer to see which parts are relevant to their own lives.
Wazimap in South Africa takes the country’s census data and turns it into something the user can explore interactively. Originally conceived as a tool for journalists, it turned out to be so accessible that it’s used by a much wider range of the population, from school children to researchers. It’s a great example of how you can transform dry data into something meaningful online, and it’s all done using free and open source tools.
Our points-to-boundaries mapping software MapIt is part of that mix, putting the data in context and ensuring that visitors can browse the data relevant to specific provinces, municipalities or wards.
We asked Greg Kempe of Code for South Africa, to fill us in on a bit more.
What exactly is Wazimap?
Wazimap helps South Africans understand where they live, through the eyes of the data from our 2011 Census. It’s a research and exploration tool that describes who lives in South Africa, from a country level right down to a ward, including demographics such as age and gender, language and citizenship, level of education, access to basic services, household goods, employment and income.
It has helped people understand not just where they work and live, but also that data can be presented in a way that’s accessible and understandable.
Users can explore the profile of a province, city or ward and compare them side-by-side. They can focus on a particular dataset to view just that data for any place in the country, look for outliers and interesting patterns in the distribution of an indicator, or draw an indicator on a map.
Of course Wazimap can’t do everything, so you can also download data into Excel or Google Earth to run your own analysis.
Wazimap is built on the open source software that powers censusreporter.org, which was built under a Knight News Challenge grant, and is a collaboration between Media Monitoring Africa and Code for South Africa.
Due to demand from other groups, we’ve now made Wazimap a standalone project that anyone can re-use to build their own instance: details are here.
How did it all begin?
Media Monitoring Africa approached Code for South Africa to build a tool to help journalists get factual background data on anywhere in South Africa, to help encourage accurate and informed reporting.
Code for South Africa is a nonprofit that promotes informed decision-making for positive social change, so we were very excited about collaborating on the tool.
Could MapIt be useful for your project? Find out more here
How exactly does MapIt fit into the project?
Mapit powers all the shape boundaries in Wazimap. When we plot a province, municipality or ward boundary on a map in Wazimap, or provide a boundary in a Google Earth or GeoJSON download, MapIt is giving Wazimap that data.
We had originally built a home-grown solution, but when we met mySociety’s Tony Bowden at a Code Camp in Italy, we learned about MapIt. It turned out to offer better functionality.
What level of upkeep is involved?
Wazimap requires only intermittent maintenance. We had municipal elections in August 2016 which has meant a number of municipal boundaries have changed. We’re waiting on Statistics South Africa to provide us with the census data mapped to these new boundaries so that we can update it. Other than that, once the site is up and running it needs very little maintenance.
What’s the impact of Wazimap?
We know that Wazimap is used by a wide range of people, including journalists, high school geography teachers, political party researchers and academics.
Code for South Africa has been approached a number of times, by people asking if they might reuse the Wazimap platform in different contexts with different data. Most recently, youthexplorer.org.za used it to power an interactive web tool providing a range of information on young people, helping policy makers understand youth-critical issues in the Western Cape.
We also know that it’s been used as a research tool for books and numerous news articles.
The success of the South African Wazimap has driven the development of similar projects elsewhere in Africa which will be launching soon, though MapIt won’t be used for those because their geography requirements are simpler.
What does the future hold?
As we’re building out Wazimap for different datasets, we’re seeing a need for taking it beyond just census data. We’re making improvements to how Wazimap works with data to make this possible and make it simpler for others to build on it.
Each new site gives us ideas for improvements to the larger Wazimap product. The great thing is that these improvements roll out and benefit anyone who uses it across every install.
Thanks very much to Greg for talking us through the Wazimap project and its use of MapIt. It’s great to hear how MapIt is contributing to a tool that, in itself, aids so many other users and organisations.
Need to map boundaries? Find out more about MapIt here
Even official records aren’t as safe as you might think they are. The archive of a country’s political history might be wiped out in a single conflagration.
Take the example of Burkina Faso, a beautiful West African country that is, sadly, perhaps best known to the rest of the world for its troubled political past.
The uprising in Burkina Faso in 2014 led to a fire in the National Assembly building and archives office. Nearly 90% of the documents were lost. Now the National Assembly is working to reconstruct the list of its parliament’s members before 1992.
This means that the data EveryPolitician has on Burkina Faso has nothing from terms before 1992. We’ve got some data for six of the seven most recent terms from the National Assembly so far, of which five are live on the site. Even though that data is not very rich (there’s little more than names in many cases; and the 6th term was transitional so data on that one’s membership might remain elusive) it’s a beginning.
We know from experience that data-gathering often proceeds piecemeal, and names are always a good place to start.
As Tinto finds new data, whether that’s more information about the politicians already collected or membership lists of the missing terms before 1992, we’ll be adding that to EveryPolitician too.
A vast collection
When people ask what EveryPolitician is, we often say, ‘The clue’s in the name’. EveryPolitician aims to provide data about, well … every politician. In the world.
(We’ve limited our scope — for the time being — to politicians in national-level legislatures).
The project is growing. Since our launch last year, we’ve got data for legislatures in 233 countries. The amount of data we’ve collected currently comprises well over three million items. The number of politicians in our datafiles is now in excess of 70,000.
Seventy thousand is an awful lot of politicians.
In fact, if you think that might be more politicians than the world needs right now, you’re right: as the Burkina Faso example shows, EveryPolitician collects historic data too.
So as well as the people serving in today’s parliaments, our data includes increasing numbers of those from the past. (Obviously, if you have such data for your country’s legislature, we’d love to hear from you!)
More than just today’s data
The Burkina Faso fire is an illustration of the value of collecting and preserving this historic data.
Of course, we’re fully aware of the usefulness of current data, because we believe that by providing it we can seed many other projects — including, but in no way limited to, parliamentary monitoring sites around the world (sites like our own TheyWorkForYou in the UK, or Mzalendo in Kenya, for example).
Nonetheless, we never intended to limit ourselves to the present. By sharing and collating historic records too, we hope to enable researchers, journalists, historians and who-knows-who-else to investigate, model, or reveal connections and trends over time that we haven’t even begun to imagine. We know this data has value; we look forward to discovering just how much value.
But it turns out we’re providing a simpler potential benefit too. EveryPolitician’s core datafiles are an excellent distributed archive.
What Burkina Faso’s misfortune goes to show is that, as historians know only too well, data sources can be surprisingly fragile.
In this case the specific situation involves paper records being destroyed by fire. That is a simple analogue warning to the digital world. Websites and their underlying databases are considerably more volatile than the most flammable of paper archives.
Database-backed sites are often poor catalogues of their pasts. Links, servers and domain registrations all expire. Access to data may be revoked, firewalls can appear.
Digital data doesn’t fade; instead it is so transient that it can simply disappear.
Of course, we cannot ourselves guarantee that our servers will be here forever (we’re not planning on going anywhere, but projects like this have to be realistic about the longer view).
There is an intriguing consequence of us using GitHub as our datastore. The fact is, the EveryPolitician data you can download isn’t coming off our servers at all. Instead, we benefit from GitHub’s industrial-scale infrastructure, as well as the distributed nature of the version control system, git, on which it is based. By its nature, every time someone clones the repository (which is easy to do), they’re securing for themselves a complete copy of all the data.
But the point is not necessarily about data persisting far into the next millennium — that’s a bit presumptuous even for us, frankly — so much as its robustness over the shorter cycles of world events. So, should any nation’s data become inaccessible (who knows? for the length of an interregnum or civil war, a natural disaster, or maybe just a work crew accidentally cutting through the wrong cable outside parliament), we want to know the core data will remain publicly available until it’s back.
Naturally there are other aspects to the EveryPolitician project which are more — as modern language would have it — compelling than collecting old data about old politicians. But the usefulness of the EveryPolitician project as a persistent archive of historical data is one that we have not overlooked.
mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.
Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.
First, some background on why we need our little bot.
Because there’s so much to do
One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:
- Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.
- Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.
- Politicians retire
Or die, or change their names or titles, or switch party or faction.
- New parties emerge
Or the existing ones change their names, or form coalitions.
- Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.
- Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.
- Someone in Russia updates the wikipedia page about a politician in Japan
Wikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)
- New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.
- New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.
So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.
We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.
The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.
So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.
We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.
The bot is already a few days into blogging — its first post was “I am a busy bot”, but you can see all the others on its own Medium page. You can also follow it on twitter as @everypolitbot. Of course, its true home, where all the real work is done, is the everypoliticianbot account on GitHub.
Images: CC-BY-SA from the EveryPolitician bot’s very own scrapbook.
- Legislatures change en masse, because of elections, etc.
For verified, reliable information, it’s usually best to go to the official source — but here’s an exception.
Checking parliament.go.ke‘s list of MPs against Mzalendo’s, our developers discovered a large number of constituency mismatches. These, explained Jessica Musila from Mzalendo, came about because the official site has not reflected boundary changes made in 2013.
Even more significantly, the official parliament site currently only holds details of 173 of the National Assembly’s 349 MPs.
“The gaps in www.parliament.go.ke validate Mzalendo’s very existence,” said Jessica. We agree: it’s a great example of the sometimes unexpected needs filled by parliamentary monitoring websites.
And of course, through EveryPolitician, we’re working to make sure that every parliamentary monitoring website can access a good, reliable source of data.
A few weeks ago, we highlighted one major difference between the Ghanaian parliament and our own: in Ghana, they register MPs’ attendance.
This week, we received news of another of our partners who are holding their representatives to account on the matter of attendance: People’s Assembly, whose website runs on our Pombola platform. The new page was contributed by Code4SA, who have been doing some really valuable work on the site lately.
According to South Africa’s Daily Maverick, in some cases MPs’ attendance is abysmally low. There’s also a history of those who “arrive, sign the register and leave a short while later”, a practice that may soon be on the decline thanks to People’s Assembly’s inclusion of data on late arrivals and early departures.
With 57 representatives — or about 15% — floundering at a zero rate of attendance, it seems that this simple but powerful display is a much-needed resource for the citizens of South Africa. See it in action here.
It’s around this time of year that we normally publish our responsiveness statistics on WriteToThem. However, if you’ve been looking forward to seeing your MP’s ranking, we’re afraid you’ll have to wait a little longer.
Two weeks after you use WriteToThem to contact a representative, we send you an automated email to check whether or not you received a response. The data gathered by these questionnaires gives us a snapshot of how well the site is working for its users; it also allows us to highlight which MPs, which parties, and which parliamentary bodies do the best and worst at responding to constituents’ messages.
We’ve habitually analysed a calendar year of responses, January to December. Last year, though, was an election year, meaning that several MPs were active up until May, and then several new MPs took their seats in the new Parliament. So we’re going to run the data in June, looking at May 2015 to May 2016, followed by a four-week period to ensure we’ve received all the questionnaires.
Now, in theory, it shouldn’t matter too much, because we rank MPs by the percentage of mail sent through WriteToThem that they respond to (or more accurately, that our users tell us they have responded to). An MP may have responded to 100% of all their mail and then been voted out; their successor may then respond to 10% of their mail: both MPs would be ranked accordingly.
In fact, that’s how we did it for 2005, the first year for which we published WriteToThem rankings, and also an election year*.
But shifting the date like this means that the data will be less confusing. It’ll let us see how every current MP has performed, in terms of responsiveness, across a full year.
Of course, one side effect of this is that if you’re an MP and you want to be top of the pops, you have an extra five months in which to boost your score… so, on your marks, time to get writing!
*2010 fell within a four-year period during which we didn’t publish rankings.
How is the data explosion transforming our world?
That’s the question that inspires the Big Bang Data exhibition, running from today until February 28 at Somerset House in London.
Alongside all kinds of data displays, data-inspired artwork and data-based innovations, the exhibition features our very own FixMyStreet and TheyWorkForYou as examples of websites that are using data for the common good.
The exhibits range from fun to thought-provoking to visually rather beautiful: we enjoyed Nicholas Felton‘s annual reports about himself, the Dear Data project, and innovative devices such as the fitness tracker for dogs. Most of all, of course, we enjoyed seeing our very own websites put into context and available for everyone to have a go with. 🙂
We’re delighted to have been included in this event, and we recommend a visit if you’re in the area. There’s plenty to keep you interested and informed for a good hour or two.
Scottish Parliamentary proceedings are now back on TheyWorkForYou.
Back in August 2014, the Scottish Parliament changed the way it published the Official Report of its debates.
TheyWorkForYou works by fetching data from various parliamentary sources—and in this case, unfortunately, the change at the Scottish Parliament end meant that our code no longer worked. We replaced our ‘debates’ section with an apologetic note.
Well, thanks to the Scottish Parliament kindly republishing the data in almost the format we used to use, we’ve managed to make some small tweaks and restore that content—including debates from the previously missing period. If you’re subscribed to alerts, you should have received an email digest with links to the backdated content (always supposing there was any that matched your chosen keywords).
And if you’re not subscribed to alerts? Now is a great time to rectify that. We’ll send you an email every time your chosen word or phrase is mentioned in Parliament, or every time your chosen representative speaks.
While we were doing this work, we also modified TheyWorkForYou so that it now pulls in ministerial data from the Scottish Parliament API. This is a welcome time-saver for us: previously we were creating a list manually from the official PDFs, while we can now automatically fetch it and reformat it into Popolo JSON, meaning it’s consistent with all our other data.
Thanks for your patience; we know that many people were awaiting this repair, and for longer than we would have liked. Enjoy!
Ever feel sorry for the less popular kids at school?
Excellent, then you’re just the sort of person we need: you may empathise with some of the countries on Gender Balance that aren’t getting quite as much attention as the rest.
Thanks to our recent data drive, Gender Balance now contains many more countries, all waiting for you to play.
But we’ve noticed that some countries aren’t getting quite as much attention as others. Gender Balance’s ultimate aim is to provide data for researchers, and we’d hate to feel that we had patchier data for those studying the less popular places.
So, to encourage take-up, we’ve now added a ‘featured country’ spot. Accept the invitation to play the highlighted place, and you’ll receive double points, propelling you all the faster towards a coveted place on the Gender Balance leaderboard. Time to get playing!
Yesterday we told you how the data on EveryPolitician had expanded wildly in the last week. One side effect is that there are 64 new countries to play on Gender Balance.
Our gender classification game (read more about it here) runs on politician data from EveryPolitician, so by adding a whole bunch of countries, we also expanded Gender Balance’s range.
It also means that, as those countries get played, we’ll be gathering even more informative and useful data about the proportions of women to men in the world’s legislatures.
That’s all we have to say, except, 3,2,1… get playing!