Earlier this year, we were fortunate enough to be contacted by Brian Keegan, Assistant Professor in Information Science at the University of Colorado Boulder, who specialises in the field of network analysis.
Brian and his team were planning to mine the official biographies of every legislator published by the Library of Congress – going back to the first Congress in 1789 – and add the information as structured data to Wikidata. Having heard of our involvement with WikiProject Every Politician, they wanted to understand more about contributing.
The research team, which included professors from the Libraries, Political Science and Information Science departments, planned to combine this biographical data with more common data in political science about voting and co-sponsorship, so that interesting questions could be asked, such as “Do Ivy League graduates form cliques?” or “Are medical doctors more likely to break with their party on votes concerning public health?”. Their hypothesis was that the biographical backgrounds of legislators could play an important role in legislative behaviours.
However, the first big step before questions could be asked (or SPARQL queries made) was supporting undergraduate students to enter biographical data for every member of Congress (going right back to the first) on Wikidata. This has not generally made it into the datasets that political scientists use to study legislative behaviour, and as students began to enter data about these historical figures, it quickly became apparent why: non-existent nations, renamed cities, and archaic professions all needed to be resolved and mapped to Wikidata’s contemporary names and standardised formats.
Nine months on, the team and ten undergraduates have revised over 1,500 Wikidata items about members of Congress, from the 104th to the 115th Congresses (1995-2018) and the 80th– 81st Congresses (1947-1951), which is 15% of the way through all members dating back to the first Congress in 1789!
They started running SPARQL queries this summer.
Joe Zamadics, a political science PhD student who worked on the project explained the potential of combining these data: “One example we tried was looking at House member ideology by occupation. The graph below shows the ideology of three occupations: athletes, farmers, and teachers (in all, roughly 130 members). The x-axis shows common ideology (liberal to conservative) and the y-axis shows member’s ideology on non-left/right issues such as civil rights and foreign policy. The graph shows that teachers split the ideological divide while farmers and athletes are more likely to be conservative.”
The team are keen to highlight the potential that semantic web technology such as Wikidata offers to social scientists.
For the full Q + A with Brian and Joe see the mySociety Medium post.