Last Saturday (August 19th) at Newspeak House in London, mySociety and Wikimedia UK held the “Wikifying Westminster” workshop, a day-long event to encourage people to get involved with Wikidata, but also to give a taste of what people can build with the data that is already there.
The vision: one day, complex investigations which currently take researchers a lot of time, such as “how many MPs are descended from people who were also MPs” or “how many people named X were MPs in year Y”, will be answerable with data from Wikidata using a single SPARQL query…
…but we’re not quite there yet. Currently, some data is scattered all over separate databases (which sometimes get shut down or disappear); some is just plain missing; and most frustrating of all, some is in place but there’s no apparent way to get it out of the database.
In order to make this vision a reality, we need to experiment with the data, find ways to check how complete it is, and explore what questions we can currently answer with it. Events like Wikifying Westminster are the perfect opportunity to do just that.
After a brief introduction to Wikidata and the EveryPolitician project, we split into two groups: one focused on learning how to use Wikidata, while the other focused on working on mini-projects.
Here’s a taste of what happened…
The learning track began by introducing new users to the basic Wikidata editing principles (or “getting data into Wikidata”). Participants were able to put their new skills into action immediately, by adding missing data on British MPs, who were mostly lacking dates and places of birth.
By the end of the first session, good progress had been made, particularly on obtaining dates of birth for current British parliamentarians. For some reason, though, it proved much harder to find these for women than for men: we can only speculate as to why that might be (do some still adhere to the idea that a woman shouldn’t reveal her age?!).
We were also given an introduction to SPARQL, a language used to query information on databases (or “getting data out of Wikidata”). Lucas Werkmeister introduced the Wikidata query service and explained a few tricks to help with using it. Participants were later able to put this to the test by running progressively difficult test queries such as “All current UK MPs” or “Who is the youngest current MP?”
Also, Navino Evans showed us the potential of reusing data, talking about Histropedia, which he co-created with Sean McBirnie. Histropedia is an awesome tool that lets you visualise thousands of topics on interactive timelines: you can browse through existing ones or create a new one from scratch.
This group both worked on improving data and looked at how well we could answer some simple “stepping stone” queries (i.e. small questions to which we already knew some of the answers) as a heuristic of how good the data in Wikidata already is. You can see and contribute questions to the list of test queries here.
Some more details:
Improving data. The focus here was on the Northern Ireland Assembly, for which Wikidata now has full membership history back to the foundation of the Assembly, and on adding academic degrees of cabinet ministers. Starting from an excellent spreadsheet of the undergraduate universities and subjects of UK politicians and ministers (going back to John Major’s cabinets), we tried to upload that data on the relevant items, adding the qualifier “academic major” (P812) to the property “educated at” (P69). In this case, the key problem we found was that we weren’t sure how to model when people did joint subjects, like “Maths and Politics”, convincing us to concentrate on the more obvious subjects first.
Answering some unusual and/or intriguing questions. Inspired by a prior finding that there are more FTSE 100 CEOs named John than there are female ones, and that John is historically the most common name of UK parliamentarians, we thought we’d find out when exactly the John-to-female balance was toppled amongst the UK’s MPs (hint: not until 1992).
Going back further in history, we queried the first time each given name was recorded in Parliament, this was inspired by a recent news article about an MP who claimed he was the first “Darren” in the Commons.
Some ideas were also born that we weren’t able to see through, for various reasons. For example, could we discover which, if any, MPs are descended from people listed in the UCL’s ‘Legacy of British Slave-owners’ database? An interesting question, but at the moment, the answer is ‘no’, partly because child-parent relationships are currently inconsistently modelled in Wikidata, and partly because of the nature of Wikidata and ancestry: if there is someone who doesn’t exist in Wikidata (e.g. Grandad Bob, the painter) in the family chain, Wikidata can’t bridge the gap between a present day MP and the slave owner who might be their ancestor.
This is just the beginning
Work, of course, is still ongoing: all pre-1997 UK data is still to be inserted or improved on Wikidata, and so much more is missing – family connections, academic degrees, links to other databases, and all sorts of “unusual stuff” that can be used for interesting queries.
This data is crucial if we want to be able to answer the really big questions which Wikidata should one day be capable of helping us explore, about what politicians do.
We can do that together!
We hope that events like this give people an easy way in to Wikidata and also show them what’s already possible to achieve with the data. Over the coming months, we are hoping to support more events of this type around the world. If you are interested in getting involved, here’s how:
- Want to improve your country’s data? Events like this can be a great way to help kickstart activities and find other people who share your goals. We are happy to help out and support people in other countries to do so.
- Are you already organising or planning to organise a similar workshop around Wikidata? Make sure it is listed on the Wikidata Event page!
- Do you want to attend future workshops? Follow us on Twitter to stay updated about events that we are running, and ones that other people are too!
Feature image credits: Mark Longair