Last Saturday (August 19th) at Newspeak House in London, mySociety and Wikimedia UK held the “Wikifying Westminster” workshop, a day-long event to encourage people to get involved with Wikidata, but also to give a taste of what people can build with the data that is already there.
The vision: one day, complex investigations which currently take researchers a lot of time, such as “how many MPs are descended from people who were also MPs” or “how many people named X were MPs in year Y”, will be answerable with data from Wikidata using a single SPARQL query…
…but we’re not quite there yet. Currently, some data is scattered all over separate databases (which sometimes get shut down or disappear); some is just plain missing; and most frustrating of all, some is in place but there’s no apparent way to get it out of the database.
In order to make this vision a reality, we need to experiment with the data, find ways to check how complete it is, and explore what questions we can currently answer with it. Events like Wikifying Westminster are the perfect opportunity to do just that.
After a brief introduction to Wikidata and the EveryPolitician project, we split into two groups: one focused on learning how to use Wikidata, while the other focused on working on mini-projects.
Here’s a taste of what happened…
The learning track began by introducing new users to the basic Wikidata editing principles (or “getting data into Wikidata”). Participants were able to put their new skills into action immediately, by adding missing data on British MPs, who were mostly lacking dates and places of birth.
By the end of the first session, good progress had been made, particularly on obtaining dates of birth for current British parliamentarians. For some reason, though, it proved much harder to find these for women than for men: we can only speculate as to why that might be (do some still adhere to the idea that a woman shouldn’t reveal her age?!).
We were also given an introduction to SPARQL, a language used to query information on databases (or “getting data out of Wikidata”). Lucas Werkmeister introduced the Wikidata query service and explained a few tricks to help with using it. Participants were later able to put this to the test by running progressively difficult test queries such as “All current UK MPs” or “Who is the youngest current MP?”
Also, Navino Evans showed us the potential of reusing data, talking about Histropedia, which he co-created with Sean McBirnie. Histropedia is an awesome tool that lets you visualise thousands of topics on interactive timelines: you can browse through existing ones or create a new one from scratch.
This group both worked on improving data and looked at how well we could answer some simple “stepping stone” queries (i.e. small questions to which we already knew some of the answers) as a heuristic of how good the data in Wikidata already is. You can see and contribute questions to the list of test queries here.
Some more details:
Improving data. The focus here was on the Northern Ireland Assembly, for which Wikidata now has full membership history back to the foundation of the Assembly, and on adding academic degrees of cabinet ministers. Starting from an excellent spreadsheet of the undergraduate universities and subjects of UK politicians and ministers (going back to John Major’s cabinets), we tried to upload that data on the relevant items, adding the qualifier “academic major” (P812) to the property “educated at” (P69). In this case, the key problem we found was that we weren’t sure how to model when people did joint subjects, like “Maths and Politics”, convincing us to concentrate on the more obvious subjects first.
Answering some unusual and/or intriguing questions. Inspired by a prior finding that there are more FTSE 100 CEOs named John than there are female ones, and that John is historically the most common name of UK parliamentarians, we thought we’d find out when exactly the John-to-female balance was toppled amongst the UK’s MPs (hint: not until 1992).
Going back further in history, we queried the first time each given name was recorded in Parliament, this was inspired by a recent news article about an MP who claimed he was the first “Darren” in the Commons.
Some ideas were also born that we weren’t able to see through, for various reasons. For example, could we discover which, if any, MPs are descended from people listed in the UCL’s ‘Legacy of British Slave-owners’ database? An interesting question, but at the moment, the answer is ‘no’, partly because child-parent relationships are currently inconsistently modelled in Wikidata, and partly because of the nature of Wikidata and ancestry: if there is someone who doesn’t exist in Wikidata (e.g. Grandad Bob, the painter) in the family chain, Wikidata can’t bridge the gap between a present day MP and the slave owner who might be their ancestor.
This is just the beginning
Work, of course, is still ongoing: all pre-1997 UK data is still to be inserted or improved on Wikidata, and so much more is missing – family connections, academic degrees, links to other databases, and all sorts of “unusual stuff” that can be used for interesting queries.
This data is crucial if we want to be able to answer the really big questions which Wikidata should one day be capable of helping us explore, about what politicians do.
We can do that together!
We hope that events like this give people an easy way in to Wikidata and also show them what’s already possible to achieve with the data. Over the coming months, we are hoping to support more events of this type around the world. If you are interested in getting involved, here’s how:
- Want to improve your country’s data? Events like this can be a great way to help kickstart activities and find other people who share your goals. We are happy to help out and support people in other countries to do so.
- Are you already organising or planning to organise a similar workshop around Wikidata? Make sure it is listed on the Wikidata Event page!
- Do you want to attend future workshops? Follow us on Twitter to stay updated about events that we are running, and ones that other people are too!
We’re also always looking for feedback and suggestions on workshop and event formats that might also work. Have you already run similar workshops? Let us know your impressions and suggestions on firstname.lastname@example.org!
Feature image credits: Mark Longair
As ever with Mozilla’s annual, hands-on festival, there was a lot going on in London’s Ravensbourne, a venue that’s especially conducive to mixing and meeting.
MozFest attracts an active and positive crowd of digital people, ranging from junior-school coder kids right through to hoary old digital campaigners. So we were delighted to meet up with old friends and make new ones, especially as some of them had travelled for afar to be there. London was fortunate once again to be hosting the event, since Mozilla is of course an international organisation. And as our main focus at this year’s event was EveryPolitician — “data about every national legislature in the world, freely available for you to use” — that international aspect was especially welcome.
As a result of our being there, we hope that lots more people know about EveryPolitician’s data, and that some of them are going to build or do amazing things with it. We’re still adding to our data, so we’d love your help: we have data on at least the current term of the top-level legislatures of most of the countries in the world. But we’d still love your help with finding good sources for the remaining few, as well as our ongoing task of going wider (adding more details about the politicians we do have) and deeper (adding historic data from previous terms).
If, in the spirit of digital do-ism that infuses MozFest, you do make something useful or funky with EveryPolitician’s data, do please let us know. We make sure all this lovely data is available to you in a consistent way (that not only means the delivery formats of CSV or JSON Popolo, but also that we adopt reliable conventions about the way we use them). This maximises the likelihood that, when you share that thing you’ve built using the data for your country, people in other places will be able to easily adopt it to work with the data for theirs. And that’s why, if you’ve made something amazing, we’d like to know — so we can shout about it.
Finally: thanks to the people who made MozFest run so smoothly this year, and the spirit of the open web. See you next year!
Image: Mozilla Festival CC BY 2.0
Amazing—we did it!
When we decided to mark Global Legislative Openness Week with a drive to get the data for 200 countries up on EveryPolitician, in all honesty, we weren’t entirely sure it could be done.
And without the help of many people we wouldn’t have got there. But last night, we put live the data for North Korea and Sweden, making us one country over the target.
The result? There is now consistently-structured, reusable data representing the politicians in 201 countries, ready for anyone to pick up and work with. We hope you will.
That’s not to say that our job is over… far from it! There’s still plenty more to be done, as we’ll explain below.
Here’s how it happened
Getting the data for each country was a multi-step process, aided by many people. First, a suitable online source had to be located. Then, a scraper would be written: a piece of code that could visit that source and pull out the information we needed—names, districts, political parties, dates of office, etc—and put it all in the right format.
Because each country’s data had its own idiosyncrasies and formatting, we needed a different scraper for every country.
Once written, we added each scraper to EveryPolitician’s list. Crucially, scrapers aren’t just a one-off deal: ideally they’ll continue to work over time as legislatures and politicians change.
The map above shows our progress during GLOW week, from 134 countries, where we began, up to today’s count of 201.
mySociety’s Tony, Lead on the EveryPolitician project, worked non-stop this week to get as many countries as possible online. But this week we’ve seen EveryPolitician reach some kind of momentum, as it takes off as a community project. It’s an ambitious idea, and it can only succeed with the help of this kind of community effort. Thanks to everyone who helped, including (in no particular order):
Duncan Walker for writing the scraper for Uganda; Joshua Tauberer for helping with the USA data; Struan Donald for handling Ecuador, Japan, Hong Kong, Serbia and the Netherlands; Dave Whiteland, with ThaiNetizen helpfully finding the data source for Thailand; Team Popong for South Korean data; Jenna Howe for her work on El Salvador; Rubeena Mahato, Chris Maddock, Kätlin Traks, François Briatte, @confirmordeny, and @foimonkey for lots of help on finding data; Henare Degan and OpenAustralia who made the scraper for Ukraine; Matthew Somerville for covering the Falkland islands and Sweden; Liz Conlan for lots of help with Peru and American Samoa; Jaroslav Semančík who provided data for, and assistance with, Slovakia; Mathias Huter who supplied current data for Austria while Steven Hirschorn wrote a scraper for the historic data; Andy Lulham who wrote a scraper for Gibraltar; Abigail Rumsey who wrote a scraper for Sri Lanka; everyone who tweeted encouragement or retweeted our requests for help.
But there’s more
There are still 40 or so countries for which we have no data at all: you can see them here. This week has provided an enormous boost to our data, but the site’s real target is, just like the name says, to cover every politician in the world.
And once we’ve done that, there’s still the matter of both historic data, and more in-depth data for the politicians we do have. Thus far, we mostly have only the lower houses for most countries which have two — and for many countries we only have the current politicians. Going into the future we need to include much richer data on all politicians, including voting records, et cetera.
Meanwhile, our first target, to have a list of the current members of every national legislature in the world, is starting to look like it’s not so very far away. If you’d like to help us reach it, here’s how you still can.
Just how quickly can we hit the 200 countries mark on EveryPolitician? That’s what we’ll be finding out this week, and one thing’s for sure — we’ll get a lot further with your help.
This week is GLOW, the Global Legislative Openness Week, and we’re marking it with a concerted drive for more data.
Tony, the project lead, has consistently added one new country every day since EveryPolitician launched four months ago, and now it’s time to put a rocket behind our efforts.
The site currently contains data for 134 countries. We’ll be going flat out to see how quickly we can reach 200, and as the excitement ramps up, we hope you will help spread the word and get involved, too. Tony will carry on working as hard as he can to fill in the gaps, but we need your help to get further, faster.
What is EveryPolitician?
How can I help?
- Help us find data for more countries! We don’t currently know where to find the politician data for many countries. Here’s a list of the ones we need and here’s a page about how to contribute. If you get stuck, give us a shout.
- Write a scraper If you have the know-how, you can help us enormously by helping scrape the data from the places we do know about. See this page for guidance on how to go about writing a scraper. You’ll find lots of examples here.
- You can also help by spreading the word – tell your friends, tweet, blog, get up on a platform and talk, and just generally share this post. Thank you!
Why do we need this data?
Politician data is readily available for most countries, but it comes in a massive variety of inconsistent formats. Most of those formats aren’t ‘machine readable’, that is to say, the data can’t readily be extracted and re-used by programmers, and pretty much every country differs on what information it provides about each politician.
That being the case, anyone who wants to build an online tool that deals with politicians from more than one country, or who would like their tool to be available to people in other places, or would like to adapt an existing tool to be used elsewhere, would first have to adapt their tool to cope with the data.
EveryPolitician saves them the trouble, and the structured format also means that the tools they build will be compatible with any other tools that use it.
What kind of tools?
EveryPolitician data will be useful for all kinds of projects.
It’ll be much easier to build a website that shows people how to contact a politician. Or one that holds a government to account and educates people about what politicians are doing. Or one that helps voters make choices by displaying facts about what their politicians believe.
It can go further than that, though — with these building blocks in place, developers can really use their imagination to put together all kinds of projects, many of which we haven’t even begun to imagine. And don’t forget that, if a tool has been built to use the standardised data, it’ll also be easy for others to redeploy elsewhere.
If you’d like to see a concrete way in which the data’s already being used, check out Gender Balance.
How can I keep up to date?
We’ll be putting out regular updates via Twitter as the number of countries covered increases — plus you can watch the map turn green on http://everypolitician.org/countries.html as we progress.