Amazing—we did it!
When we decided to mark Global Legislative Openness Week with a drive to get the data for 200 countries up on EveryPolitician, in all honesty, we weren’t entirely sure it could be done.
And without the help of many people we wouldn’t have got there. But last night, we put live the data for North Korea and Sweden, making us one country over the target.
The result? There is now consistently-structured, reusable data representing the politicians in 201 countries, ready for anyone to pick up and work with. We hope you will.
That’s not to say that our job is over… far from it! There’s still plenty more to be done, as we’ll explain below.
Here’s how it happened
Getting the data for each country was a multi-step process, aided by many people. First, a suitable online source had to be located. Then, a scraper would be written: a piece of code that could visit that source and pull out the information we needed—names, districts, political parties, dates of office, etc—and put it all in the right format.
Because each country’s data had its own idiosyncrasies and formatting, we needed a different scraper for every country.
Once written, we added each scraper to EveryPolitician’s list. Crucially, scrapers aren’t just a one-off deal: ideally they’ll continue to work over time as legislatures and politicians change.
The map above shows our progress during GLOW week, from 134 countries, where we began, up to today’s count of 201.
mySociety’s Tony, Lead on the EveryPolitician project, worked non-stop this week to get as many countries as possible online. But this week we’ve seen EveryPolitician reach some kind of momentum, as it takes off as a community project. It’s an ambitious idea, and it can only succeed with the help of this kind of community effort. Thanks to everyone who helped, including (in no particular order):
Duncan Walker for writing the scraper for Uganda; Joshua Tauberer for helping with the USA data; Struan Donald for handling Ecuador, Japan, Hong Kong, Serbia and the Netherlands; Dave Whiteland, with ThaiNetizen helpfully finding the data source for Thailand; Team Popong for South Korean data; Jenna Howe for her work on El Salvador; Rubeena Mahato, Chris Maddock, Kätlin Traks, François Briatte, @confirmordeny, and @foimonkey for lots of help on finding data; Henare Degan and OpenAustralia who made the scraper for Ukraine; Matthew Somerville for covering the Falkland islands and Sweden; Liz Conlan for lots of help with Peru and American Samoa; Jaroslav Semančík who provided data for, and assistance with, Slovakia; Mathias Huter who supplied current data for Austria while Steven Hirschorn wrote a scraper for the historic data; Andy Lulham who wrote a scraper for Gibraltar; Abigail Rumsey who wrote a scraper for Sri Lanka; everyone who tweeted encouragement or retweeted our requests for help.
But there’s more
There are still 40 or so countries for which we have no data at all: you can see them here. This week has provided an enormous boost to our data, but the site’s real target is, just like the name says, to cover every politician in the world.
And once we’ve done that, there’s still the matter of both historic data, and more in-depth data for the politicians we do have. Thus far, we mostly have only the lower houses for most countries which have two — and for many countries we only have the current politicians. Going into the future we need to include much richer data on all politicians, including voting records, et cetera.
Meanwhile, our first target, to have a list of the current members of every national legislature in the world, is starting to look like it’s not so very far away. If you’d like to help us reach it, here’s how you still can.
If you need data on the people who make up your parliament, another country’s parliament, or indeed all parliaments, you may be in luck.
What’s more, it’s all provided as Open Data to anyone who would like to use it to power a civic tech project. We’re thinking parliamentary monitoring organisations, journalists, groups who run access-to-democracy sites like our own WriteToThem, and especially researchers who want to do analysis across multiple countries.
But isn’t that data already available?
Yes and no. There’s no doubt that you can find details of most parliaments online, either on official government websites, on Wikipedia, or on a variety of other places online.
But, as you might expect from data that’s coming from hundreds of different sources, it’s in a multitude of different formats. That makes it very hard to work with in any kind of consistent fashion.
Every Politician standardises all of its data into the Popolo standard and then provides it in two simple downloadable formats:
- csv, which contains basic data that’s easy to work with on spreadsheets
- JSON which contains richer data on each person, and is ideal for developers
This standardisation means that it should now be a lot easier to work on projects across multiple countries, or to compare one country’s data with another. It also means that data works well with other Poplus Components.
What can I do with it?
Need a specific example? Yesterday, we introduced Gender Balance, the game that gathers data about women in politics.
As you’ll know if you’ve already given it a try, Gender Balance works by displaying politicians that make up one of the world’s legislatures, one by one.
That data all comes from Every Politician, and it’s meant that the developers have been able to concentrate on making a smooth and functional interface, knowing that the data side of things has already been taken care of.
That’s just one way to use Every Politician data, though. If you’d like to use it in your own site or app, you can find out more here.
We still need more data
As you may have noticed, there are more than 100 parliaments in the world. In fact, despite having reached what feels like a fairly substantial milestone, we’re still barely half way to getting some data for every parliament.
So we could use your help in finding data for the parliaments we don’t yet cover, and historic information for the ones we do. Read more about how you can help out.
As you’ll know if you’re a regular reader of this blog, YourNextMP crowd-sourced details of every candidate who stood in the UK general election.
But, just because our own election is over, doesn’t mean we’ll be letting YourNextMP gather dust. On the contrary—we want to see it being re-used wherever there are elections being held, and citizens needing information! We’re already seeing the first re-use case, and we’d love to see more.
Opening up data
YourNextMP’s main purpose was to provide a free, open database of candidates, so that anyone who wanted to could build their own tools on top of it, and it was very successful with that aim.
The traditional source of candidate data for such projects has been through expensive private providers, not least because the official candidate lists are published just a few days before the election.
Thanks to YourNextMP’s wonderful crowd-sourcing and triple-checking volunteers, we reckon that we had the most complete, most accurate data, the earliest. And it was free.
Directly informing over a million citizens
YourNextMP also came into its own as a direct source of information for the UK’s electorate. This hadn’t been the priority when the project was launched, but it was helped greatly by the fact that constituency and candidate pages ranked very highly in search engines from early on, so anyone searching for their local candidates found the site easily.
Once they did so, they found a list of everyone standing in their constituency, together with contact details, links to their online profiles such as web pages, social media and party websites, and feeds from spin-off projects (themselves built on YourNextMP data) such as electionleaflets.org and electionmentions.com.
YourNextMP had more than a million unique users. In the weeks just prior to the UK general election, it was attracting approximately 20,000 visitors per day, and on the day before the election, May 6th, there was suddenly a massive surge: that day the site was visited by nearly 160,000 people.
So, in a nutshell: YourNextMP has not only enabled a bunch of projects which helped people become more informed before our election—it also directly informed over a million citizens.
A reusable codebase
And, in the spirit of Poplus, the codebase is open for anyone to re-use in any country.
It’s already being pressed into use for the upcoming elections in Argentina, and we hope that developers in many other countries will use it to inform citizens, and inspire great web tools for the electorate, when their own elections come around.
If that’s something that interests you, please come and talk, ask questions and find out what’s involved, over on the Democracy Club mailing list.
Note (June 2016): This post is now slightly out of date. FixMyTransport is no longer running, though all of the other APIs and tools listed are still available.
There is also one significant addition which developers should find useful: EveryPolitician, which provides data on all current politicians around the world (and, in the future, we hope, all past ones too). See more here.
Much of what we do here at mySociety relies on Open Data, so naturally we support Open Data Day. In case you haven’t come across this event before, here’s the low-down:
Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption open data policies by the world’s local, regional and national governments.
If you’re planning on being a part of Open Data Day, you may find some of mySociety’s feeds, tools and APIs useful. This post attempts to put them all in one place. (more…)
Until about two years ago I was quite actively involved in the Open Data movement. I sat in on the 2007 gathering in California where the first Open Data Principles were drafted, and later sat on the Transparency Board at the UK government.
I stopped being involved in early 2012 because I saw a couple of things happening. First, the Open Data baton had been picked up by dedicated, focused advocates like the Open Data Institute and the Open Knowledge Foundation, who could give 100% to fighting this fight (I always had to fit it around managing a growing organisation with other goals). And second I felt that the surge of relatively meaningful data releases in the country I live in (the UK) had pretty much come to an end. The real policy action and innovation will now happen in more rapidly-changing countries where transparency is a more visceral issue.
Still, despite walking away, I remained optimistic. It seemed more or less impossible to imagine that in twenty years’ time that there wouldn’t be quite a bit more Open Data around, especially in rich countries. But given the virtually-zero political gain to be had from this agenda in countries like the UK, where is said data actually going to come from?
Learning from Microsoft (really)
The more I thought about it, the more I realised that we’d already seen the answer in the form of Microsoft. Throughout the 1990s the .doc and .xls standard rose and took over governments around the world, even though there was never anything like a clear policy process that drove that decision.
There was certainly no high profile ‘Microsoft Government Partnership’ with international conferences and presidential speeches. Instead there was a safe, ‘no brainer’ product that governments bought to solve their problems, and these data standards came with it. The pressure on governments to do anything at all probably came from the fact that the private sector had widely adopted Office first.
I think that a recurrence of this phenomenon – change-through-replacing-old-computers – is where Open Data at real scale is going to come from. I think it’s going to come from old government computers being thrown away at their end-of-life and replaced with new computers that have software on them that produces Open Data more or less by default.
The big but
However, there’s a big BUT here. What if the new computers don’t come with tools that produce Open Data? This is where SayIt comes in, as an example of a relatively low-cost approach to making sure that the next generation of government IT systems do produce Open Data.
SayIt is a newly launched open source tool for publishing transcripts of trials, debates, interviews and so on. It publishes them online in a way that matches modern expectations about how stuff should work on the web – responsive, searchable and so on. It’s being built as a Poplus Component, which means it’s part of an international network of groups collaborating on shared technologies. Here’s JK Rowling being interviewed, published via SayIt.
But how does this little tool relate to the business of getting governments to release more Open Data? Well, SayIt isn’t just about publishing data, it’s about making it too – in a few months we’ll be sharing an authoring interface for making new transcripts from whatever source a user has access to.
We hope that having iterated and improved this authoring interface, SayIt can become the tool of choice for public sector transcribers, replacing whatever tool they use today (almost certainly Word). Then, if they use SayIt to make a transcript, instead of Word, then it will produce new, instantly-online Open Data every time they use it.
The true Open Data challenge is building brilliant products
But we can’t expect the public sector to use a tool like SayIt to make new Open Data unless it is cheaper, better and less burdensome than whatever they’re using now. We can’t – quite simply – expect to sell government procurement officers a new product mainly on the virtues of Open Data. This means the tough task of persuading government employees that there is a new tool that is head-and-shoulders better than Excel or Word for certain purposes: formidable, familiar products that are much better than their critics like to let on.
So in order for SayIt to replace the current tools used by any current transcriber, it’s going to have to be really, really good. And really trustworthy. And it’s going to have to be well marketed. And that’s why we’ve chosen to build SayIt as an international, open source collaboration – as a Poplus Component. Because we think that without the billions of dollars it takes to compete with Microsoft, our best hope is to develop very narrow tools that do 0.01% of what Word does, but which do that one thing really really well. And our key strategic advantage, other than the trust that comes with Open Source and Open Standards, is the energy of the global civic hacking and government IT reform sector. SayIt is far more likely to succeed if it has ideas and inputs from contributors from around the world.
Regardless of whether or not SayIt ever succeeds in penetrating inside governments, this post is about an idea that such an approach represents. The idea is that people can advance the Open Data agenda not just by lobbying, but also by building and popularising tools that mean that data is born open in the first place. I hope this post will encourage more people to work on such tools, either on your own, or via collaborations like Poplus.
Photo by Troy Morris (CC)
FixMyStreet, our site for reporting things like potholes and broken street lights, has had something of a major redesign, kindly supported in part by Kasabi. With the help of Supercool, we have overhauled the look of the site, bringing it up to date and making the most of some lovely maps. And as with any mySociety project, we’d really appreciate your feedback on how we can make it ever more usable.
The biggest change to the new FixMyStreet is the use of responsive design, where the web site adapts to fit within the environment in which it’s being viewed. The main difference on FixMyStreet, besides the obvious navigation changes, is that in a small screen environment, the reporting process changes to have a full screen map and confirmation step, which we thought would be preferable on small touchscreens and other mobiles. There are some technical details at the end of this post.
Along with the design, we’ve made a number of other improvements along the way. For example, something that’s been requested for a long time, we now auto-rotate photos on upload, if we can, and we’re storing whatever is provided rather than only a shrunken version. It’s interesting that most photos include correct orientation information, but some clearly do not (e.g. the Blackberry 9800).
We have many things we’d still like to do, as a couple of items from our github repository show. Firstly, it would be good if the FixMyStreet alert page could have something similar to what we’ve done on Barnet’s planning alerts service, providing a configurable circle for the potential alert area. We also are going to be adding faceted search to the area pages, allowing you to see only reports in a particular category, or within a certain time period.
Regarding native phone apps – whilst the new design does hopefully work well on mobile phones, we understand that native apps are still useful for a number of reasons (not least, the fact photo upload is still not possible from a mobile web app on an iPhone). We have not had the time to update our apps, but will be doing so in the near future to bring them more in line with the redesign and hopefully improve them generally as well.
The redesign is not the only news about FixMyStreet today
As part of our new DIY mySociety project, we are today publishing an easy-to-read guide for people interested in using the FixMyStreet software to run versions of FixMyStreet outside of Britain. We are calling the newly upgraded, more re-usable open source code the FixMyStreet Platform.
This is the first milestone in a major effort to upgrade the FixMyStreet Platform code to make it easier and more flexible to run in other countries. This effort started last year, and today we are formally encouraging people to join our new mailing list at the new FixMyStreet Platform homepage.
Coming soon: a major upgrade to FixMyStreet for Councils
As part of our redesign work, we’ve spoken to a load of different councils about what they might want or need, too. We’re now taking that knowledge, combining it with this redesign, and preparing to relaunch a substantially upgraded FixMyStreet for Councils product. If you’re interested in that, drop us a line.
Kasabi: Our Data is now in the Datastore
Finally, we are also now pushing details of reports entered on FixMyStreet to Kasabi’s data store as open linked data; you can find details of this dataset on their site. Let us know if it’s useful to you, or if we can do anything differently to help you.
On a mobile, you can see that the site navigation is at the end of the document, with a skip to navigation link at the top. On a desktop browser, you’ll note that visually the navigation is now at the top. In both cases, the HTML is the same, with the navigation placed after the main content, so that it hopefully loads and appears first. We are using display: table-caption and caption-side: top in the desktop stylesheet in order to rearrange the content visually (as explained by Jeremy Keith), a simple yet powerful technique.
If you have any technical questions about the design, please do ask in the comments and I’ll do my best to answer.
If you haven’t got a penny,
A ha’penny will do,
If you haven’t got a ha’penny,
Then God bless you.
We wish you all a merry and prosperous Christmas – and for those of you who are already feeling quite prosperous enough, may we point you in the direction of our charitable donations page?
mySociety’s work is made possible by donations of all sizes and from all sorts of people. Those donations help fund all the online projects we create; projects that give easy access to your civic and democratic rights. If that’s important to you, show your appreciation, and we promise we’ll make the best use of every penny.
Thank you for sticking with us through this month-long post. We hope you’ve found it interesting and we wish you the very merriest of Christmases.
What’s behind the door? A letter to Santa.
If you can fit them down the chimney, here’s what we’re dreaming of:
More publicly available data Of course, we were delighted to hear in Mr Osborne’s autumn statement that all sorts of previously-inaccessible data will be opened up.
We’re wondering whether this new era will also answer any of our FixMyStreet geodata wishes. Santa, if you could allocate an elf to this one, we’d be ever so pleased.
Globalisation …in the nicest possible way, of course. This year has seen us work in places previously untouched by the hand of mySociety, including Kenya and the Philippines. And we continue to give help to those who wish to replicate our projects in their own countries, from FixMyStreet in Norway to WhatDoTheyKnow in Germany.
Santa, please could you fix it for us to continue working with dedicated and motivated people all around the world?
A mySociety Masters degree We’re lucky enough to have a team of talented and knowledgeable developers, and we hope we will be recruiting more in the coming year. It’s not always an easy task to find the kind of people we need – after all, mySociety is not your average workplace – so we’ve come to the conclusion that it’s probably easiest to make our own.
Back in February, Tom started thinking about a Masters in Public Technology. It’s still something we’re very much hoping for. Santa, is it true you have friends in academic circles?
FixMyTransport buy-in – from everyone! Regular users of FixMyTransport will have noticed that there are different kinds of response from the transport operators: lovely, fulsome, helpful ones, and formulaic ones. Or, worse still, complete refusal to engage.
Santa, if you get the chance, please could you tell the operators a little secret? Just tell them what those savvier ones already know – that FixMyTransport represents a chance to show off some fantastic customer service. And with 25,000 visitors to the site every week, that message is soon spread far and wide.
I was just talking to someone in a local council about the fact that they’d opened up the location of 27,000 streetlights in their council area. They wanted to know if FixMyStreet could incorporate them so that problem reports could be more accurately attached.
This conversation reminded me that we’ve had an informal wish list of geodata for FixMyStreet for some time. What we need is more data that lets us send problems to the correct entity when the problem is not actually a council responsibility.
I’m just posting these up to see if anyone knows a guy who knows a girl who knows a dog who knows how to get hold of any of these datasets. In some vector data format, if possible, please!
- Canals and responsible authorities
- Supermarkets (esp car parks) and responsible companies
- Network Rail’s land
- Council owned land
- Land and roads controlled by the Highways agency
- Shopping malls
- National parks
- BT phone boxes (the original problem which inspired FixMyStreet)
So, do you know someone who might know someone who can help us improve FixMyStreet? And guess what, if we do add this to our web services, you’ll probably be able to query them too.
Note: This post is a work in progress, I need your help to improve it, especially with knowledge of non-English sites
I was recently in Washington DC catching up with mySociety’s soul-mates at the Sunlight Foundation. As we talked about what was going on in the field of internet-enabled transparency, it came clear to me that there are now more identifiable categories of transparency website than there used to be.
Identifying and categorising these types of site turns out to be surprisingly useful. First, it can help people ask “Why don’t we have anyone doing that in our country?” Second, it can help mySociety to make sure that when we’re planning ahead we don’t fail to consider certain options that be currently off our radar. Also, it gives me an excuse to tell you about some sites that you may not have seen before.
Anyway, enough preamble. Here they are as I see them – please give me more suggestions as you find them. As you can see there’s a lot more activity in some fields than others.
1. Transparency blogs & newspapers – At the technically simplest, but most manual labour-intensive end of the scale is sites, commercial and volunteer driven, whose owners use transparency to help them to write stories. Given almost every political blog does this a bit, it can be hard to name specific examples, but I will note that Heather Brooke is the UK’s pre-eminent FOI-toting journalist/blogger, and we’ve just opened a blog for our awesome volunteers on WhatDoTheyKnow to show their FOI skills to an as-yet unsuspecting public.
2. What Politicians do in their parliaments – These sites primarily include lists of politicians, and information about their primary activities in their assemblies, such as voting or speaking. This encompasses mySociety’s TheyWorkForYou.com, Rob McKinnon’s one man labour of love TheyWorkForYou in NZ, Italy’s uber-deep OpenPolis.it (6 layers of government, anyone?), Germany’s almost-un-typable Abgeordnetenwatch, Romania’s writ-wielding IPP.ro, Josh Tauberer’sGovTrack.us, plus the bonny bouncing babies OpenAustralia and Kildare Street (Ireland). Of special note here are Mzalendo (Kenya) who unlike everyone else, can’t reply on access to a parliamentary website to scrape raw data from, and Julian Todd’s UNDemocracy (International), that has to fight incredible technical barriers to get the information out.
3. Databases of questions and answers posed to politicians – These sites let people post politicians questions, and the publish the questions and answers. The Germans running Abgeordnetenwatch (Parliament Watch) seem to have had considerable success here, with newspapers citing what politicians say on their site. Yoosk has some politicians in the UK on it, too.
4. Money in politics – This comes in two forms, money given to candidates (MAPlight), and money bunged by politicians to their favourite causes (Earmark watch). In the UK, as far as I know, the Electoral Commission’s database remains currently unscraped, perhaps because the data is so ungranular.
6. Websites containing bills going through parliament, or the law as voted on – This includes the increasingly substantial OpenCongress in the US which saw major traffic during the Health Care debates, and the UK government’s own Acts database and Statute Law Database. Much of the legal database field, however, remains essentially private.
7. Services that create transparency as a side effect of delivering services – Our own sites lead the way here: FixMyStreet‘s public problem reports and WhatDoTheyKnow’s FOI archive are both created by people who aren’t primarily using the site to enrich it – they’re using it to get some other service.
8. Election websites – These come in many forms, but what they have in common is their desire to shed light on the positions and histories of candidates, whether incumbents or new comers. The biggest beast here is Stemwijzer (Netherlands), probably in relative terms the most used transparency or democracy site ever. However these sites are popular in several places, the big but highly labour intensive VoteSmart (US), Smartvote.ch (Switzerland), plus others. mySociety is shortly to start to recruit constituency volunteers to help with our take on this problem, keep an eye on this blog if you want to know more.
9. Political document archives – This is a new category, now occupied by Sunlight’s Partytime archive for invitation to political events, and TheStraightChoice, Julian Todd and Richard Pope’s wonderful new initiative for archiving election leaflets and other paper propoganda.
10. Bulk data – Online transparency pioneer Carl Malamud doesn’t do sites, he does data. Big globs zipped up and made publicly available for coders and researchers to download and process. The US government has now stepped into this field itself with Data.gov, doubtless soon to be followed by data.gov.uk.
Please don’t shoot me if I’ve missed anything here, the world is a big place. But I thought that was a useful and interesting exercise, and I hope you’ll both find it useful, and help me improve it too. Comment away.
Matthew’s just updated ScenicOrNot, the little game that we built to provide a ‘Scenicness’ dataset for Mapumental, to include a data dump of the raw data. The dump will update automatically on a weekly basis, but currently it contains averaged scores for 181,188 1*1km grid squares, representing 83% of the Geograph dataset we were using, or 74% of all the grid squares in Great Britain. It is, in other words, really pretty good, and, I think, unprecedented in coverage as a piece of crowd sourced geodata about a whole country.
It’s available under the Creative Commons Attribution Noncommercial 3 Licence, and we greatly look forward to seeing what people do with it.