Amazing—we did it!
When we decided to mark Global Legislative Openness Week with a drive to get the data for 200 countries up on EveryPolitician, in all honesty, we weren’t entirely sure it could be done.
And without the help of many people we wouldn’t have got there. But last night, we put live the data for North Korea and Sweden, making us one country over the target.
The result? There is now consistently-structured, reusable data representing the politicians in 201 countries, ready for anyone to pick up and work with. We hope you will.
That’s not to say that our job is over… far from it! There’s still plenty more to be done, as we’ll explain below.
Here’s how it happened
Getting the data for each country was a multi-step process, aided by many people. First, a suitable online source had to be located. Then, a scraper would be written: a piece of code that could visit that source and pull out the information we needed—names, districts, political parties, dates of office, etc—and put it all in the right format.
Because each country’s data had its own idiosyncrasies and formatting, we needed a different scraper for every country.
Once written, we added each scraper to EveryPolitician’s list. Crucially, scrapers aren’t just a one-off deal: ideally they’ll continue to work over time as legislatures and politicians change.
The map above shows our progress during GLOW week, from 134 countries, where we began, up to today’s count of 201.
mySociety’s Tony, Lead on the EveryPolitician project, worked non-stop this week to get as many countries as possible online. But this week we’ve seen EveryPolitician reach some kind of momentum, as it takes off as a community project. It’s an ambitious idea, and it can only succeed with the help of this kind of community effort. Thanks to everyone who helped, including (in no particular order):
Duncan Walker for writing the scraper for Uganda; Joshua Tauberer for helping with the USA data; Struan Donald for handling Ecuador, Japan, Hong Kong, Serbia and the Netherlands; Dave Whiteland, with ThaiNetizen helpfully finding the data source for Thailand; Team Popong for South Korean data; Jenna Howe for her work on El Salvador; Rubeena Mahato, Chris Maddock, Kätlin Traks, François Briatte, @confirmordeny, and @foimonkey for lots of help on finding data; Henare Degan and OpenAustralia who made the scraper for Ukraine; Matthew Somerville for covering the Falkland islands and Sweden; Liz Conlan for lots of help with Peru and American Samoa; Jaroslav Semančík who provided data for, and assistance with, Slovakia; Mathias Huter who supplied current data for Austria while Steven Hirschorn wrote a scraper for the historic data; Andy Lulham who wrote a scraper for Gibraltar; Abigail Rumsey who wrote a scraper for Sri Lanka; everyone who tweeted encouragement or retweeted our requests for help.
But there’s more
There are still 40 or so countries for which we have no data at all: you can see them here. This week has provided an enormous boost to our data, but the site’s real target is, just like the name says, to cover every politician in the world.
And once we’ve done that, there’s still the matter of both historic data, and more in-depth data for the politicians we do have. Thus far, we mostly have only the lower houses for most countries which have two — and for many countries we only have the current politicians. Going into the future we need to include much richer data on all politicians, including voting records, et cetera.
Meanwhile, our first target, to have a list of the current members of every national legislature in the world, is starting to look like it’s not so very far away. If you’d like to help us reach it, here’s how you still can.
TheyWorkForYou has, until now, only covered things that have already happened, be that Commons main chamber debates since 1935, Public Bill committees back to 2000, or all debates in the modern Northern Ireland Assembly.
From today, we are taking the UK Parliament’s upcoming business calendar and feeding it into our database and search engine, which means some notable new features. Firstly, and most simply, you can browse what’s on today (or the next day Parliament is sitting), or 16th May. Secondly, you can easily search this data, to e.g. see if there will be something happening regarding Twickenham. And best of all, if you’re signed up for an email alert – see below for instructions – you’ll get an email about any matching future business along with the matching new Hansard data we already send. We currently send about 25,000 alerts a day, with over 65,000 email addresses signed up to over 111,000 alerts.
Mark originally wrote some code to scrape Parliament’s business papers, but this sadly proved too fragile, so we settled on Parliament’s calendar which covers most of the same information and more importantly has (mostly) machine-readable data. Duncan and I worked on this intermittently amidst our other activities, with Duncan concentrating on the importer and updating our search indexer (thanks as ever to Xapian) whilst I got on with adding and integrating the new data into the site.
I’ve also taken the opportunity to rejig the home page (and fix the long-standing bug with popular searches that meant it was nearly always Linda Gilroy MP!) to remove the confusingly dense amount of recent links, bring it more in line with the recently refreshed Scottish Parliament and Northern Ireland Assembly home pages, and provide more information to users who might not have any idea what the site covers.
Signing up for an email alert: If you want to receive an email alert on a particular person (MP, Lord, MLA or MSP), visit their page on TheyWorkForYou and follow the “Email me updates” link. If you would like alerts for a particular word or phrase, or anything else, simply do a search for what you’re after, then follow the email alert or RSS links to the right of the results page.
We are very happy to announce that Duncan Parkes has joined mySociety, bringing our team of full time core developers up to four.
Duncan is the incredibly prolific author of screen scrapers for the lovely PlanningAlerts.com which he runs with Richard Pope.
He also has a PhD in Mathematics, which I expect you’ll want to read all of here, and is an editor of Open Source programming books with APress. During the vetting process he listed one of the passions of his life as being ‘Unit Testing‘, which, combined with his love of postbox crowdsourcing, made picking him more or less a no brainer.
As a new edition has just been released, and I’ve had to tweak the parser to cope with the new highlighting, it’s a good time to write a brief article on TheyWorkForYou’s handling of the House of Commons Register of Members’ Financial Interests (Register of Members’ Interests as was before the current edition). Way back in the day, a scraper/parser was written (by either Julian or Francis) that monitors the Register pages on www.parliament.uk for new editions, and downloads and broadly parses the HTML into machine-readable data. The XML produced can be found at http://ukparse.kforge.net/parldata/scrapedxml/regmem/ – TheyWorkForYou then pulls in this XML into its database, and makes the latest data available on every MP’s page.
However, as it’s been scraping/parsing the Register since 2000, we can do more than that. Each MP’s page contains a link to a page giving the history of their entry in the Register – when things were added, removed, or changed. You can also view the differences between one edition of the Register and the next, or view a particular edition in a prettier form than the official site. There’s a central page containing everything Register-related at http://www.theyworkforyou.com/regmem/
So, I’ve just had a shower and I’m waiting for Matthew and Tom to turn up. As time goes on, mySociety seems to get more geographically disparate, and I look forward to meeting my coworkers. Then for 1pm we’ll be heading to CB2 for the mySociety developers meeting. Feel free to come along any time afternoon or evening, whatever your skills or interest in mySociety.
I haven’t posted on here for ages, since October. I’ve been away on holiday quite a lot, and when I’ve not been away I’ve been busy, partly with systems administration. We’ve set up lots of servers in the last month for the E Petitions site. When you go from 3 servers to 7 servers, there’s another step change in sorting out systems administration tools. For example, I had to change the monitoring script so every server wouldn’t monitor every other. And I had to work out the quirks and bugs in the system we have for storing config files for different classes of server in CVS. Because we only had one class of server before.
I’ve also had to learn lots about server monitoring and load balancing. Things have slowed down a bit now, to maybe 10 hits per second. But a few weeks ago the road pricing petition was often getting 50 hits per second. I’ve never worked on a site with that level of traffic before. You find all the bugs in your code, all the missing indices in PostgreSQL, all the badly tweaked FastCGI parameters. I’ve been sucking knowledge off Chris like a sponge, so tools like strace and vmstat begin to become instinctive. Seemingly nobody offers a book or a course which teaches this stuff well – every server setup is different, everyone knows different ways to tune and profile. But maybe you can tell me different in the comments.
Louise has been busily working away on lots of things. Amongst that is a major change to WriteToThem, to let you write to all the members in a multi-member constituency in one go. The last day or two, she’s been installing the WriteToThem test code on one of our servers, when it has only run on my laptop before. This will be fantastic – hopefully can get Matthew to be bolder about making changes to WriteToThem, if he has a test script he can easily run (getting Matthew to be bold isn’t normally a problem, but he seems mildly less bold when it comes to the WriteToThem queue daemon).
Tom and I have also been busy on a second travel maps report for the DfT. More on that soon. Lots of running screen scraping jobs on TransportDirect which take days. On the subject of Tom, he seems to have got expert at “stacking meetings” next to each other. In one day last week he had 7 meetings!
I seem to keep ending up in the North of England on work this Autumn, which is good as that’s where I’m from.
The Liverpool event on Friday went really well. Over the day, we probably had about 10 people altogether. It ended up more like a rolling seminar series – we all introduced and talked about what we were up to. Of course, Matthew on the TheyWorkForYou API, Tom on what mySociety doing these days. Also Julian (who does lots of the Parliament screen scraping that TheyWorkForYou depends on) talked about using Public Whip for collaborative research on Parliamentary votes. We spent a while talking about Fredom of Information with Steve Wood, who lives in Liverpool.
Ben (noii) told us about Tad Hirsch at MIT, who has done lots of interesting projects, which are inspiring for mySociety style project ideas. I’m sure we can do a lot with automatic routing of voice calls to volunteers, and with text message lists. Another Ben showed us an early version of some great work he is doing extracting information about newspapers and journalists. Thanks very much to Aidan from Blue Fountain who hosted us in the beautiful India building. We’ll be having two more events to promote the TheyWorkForYou API, and talk about other mySociety matters, later in the year. One in London, one somewhere else, suggestions welcome.
Earlier in September, all of us went to work together in the Lake District (as mentioned in my last post). It’s important to meet up for a solid chunk of time in the year when we spend most of it spread out across the Internet. Matthew took lots of photos, one of which is above (Tom, Louise, Matthew, Chris, Francis and Anna from left to right). It might look from the photo like we were brainstorming solutions to thorny problems, clearing back logs of customer support, writing code, and creating todo lists as long as your arm. But actually we just climbed mountains, hung out by tarns, drank beer and had a good time.
And next week I’m back off up North, to Betws-y-Coed in Wales this time. To give a talk to Bloc about TheyWorkForYou.
Lots of stuff happening here.
Earlier in the week, I’ve been getting WriteToThem to update more of its data automaticaly. Two volunteers contributed useful screen scrapers. Richard George’s gathers data from the Welsh assembly, and Jonathan Hogg’s screen scrapes the Scottish Parliament. They both spit out CSV files with representatives, constituencies and contact emails/faxes. I’ve now updated the script that can load in those CSV files, and set it all running once a week on cron. Along with another London Assembly scraper Chris wrote earlier in the year, and some code to get MPs from parlpase.
Today I’ve been doing other bits, including improving the link from WriteToThem to HearFromYourMP. When somebody has confirmed a message to be sent with WriteToThem, we know their email address is valid. So, why if they follow the link to HearFromYourMP do they have to confirm again? It’s bad user interface, and is probably reducing our signups to HearFromYourMP a bit.
The fix is to pass a signed email address through from WTT, and check the signature on HFYMP. The magic of hashes and shared secrets does the job.
Things have been quiet here recently, but are now getting busy again. Tom’s back from America, Chris is back from holiday, I’m better after being ill for most of last week.
Earlier in the week we finally managed to load new county boundaries into MaPit. So WriteToThem once again has county councils working. Please try it out with your postcode. Let us know of any problems.
This required lots of work from Chris, because a new version of BoundaryLine (from Ordnance Survey) has not yet been released with the updated boundaries. He’s done it using lists of the district council wards which make up the county electoral divisions.
These lists were taken from the Statutory Instruments. This has covered most postcodes, but there are still some where the boundaries were specificed in text (walk along this river etc.) rather than wards. And we don’t have those.
The last couple of days I’ve been turning on lots of things to automate updating of WriteToThem. A cron job now grabs new data on councillors from GovEval once a day, and merges their changes with any changes we’ve made.
It’s automatically emailing GovEval with user submitted corrections to councillor data (the “Have you spotted a mistake in the above list?” link on WriteToThem). Hopefully this will create a virtuous feedback loop of ever improving data quality goodness. Or at least let us keep up with council by-elections without having to lift a finger.
Finally I’ve made it send a mail once a week to the mailing list where WriteToThem admins (mostly volunteers) hang out. This describes what needs doing – such as missing contact details to gather, or messages in the queue which need human attention.
Next up, wiring up the new screenscrapers Richard and Jonathan contributed last week, so the Welsh and London Assemblies automatically update…
mySociety is pleased to announce that it has contracted accessibility wunderkind Matthew Somerville on a part time basis. Matthew’s most famous for his accessible versions of the National Rail Enquiries and Odeon websites. He’s also been heavily involved behind the scenes with TheyWorkForYou. Matthew’s first task has been to start working on PledgeBank. We’ll have a sneak preview for you all soon