We’re very excited to announce that our Parliamentary website TheyWorkForYou.com now includes video of debates in the House of Commons – but we need your help to match up each speech with the video footage.
It’s really easy to help out. We’ve built a really simple, rather addictive system that lets anyone with a few spare minutes match up a randomly-selected speech from Hansard against the correct snippet of video. You just listen out for a certain speech, and when you hear it you hit the big red ‘now’ button. Your clip will then immediately go live on TheyWorkForYou next to the relevent speech, improving the site for everyone. Yay!
You can start matching up speeches with video snippets right away, but if you take 30 seconds to register a username then we’ll log every speech that you match up and recognise your contribution on our “top timestampers” league table. We’ll send out mySociety hoodies to the top timestampers – they’re reserved exclusively for our volunteers as a badge of honour.
We think that this really easy approach to crowd-sourcing data about online video could come in useful in many different situations – not just for politics – and we hope that it gets used all over the place. It might even be a world first, we’re not sure. If you’d like us to create something similar for your local legislature, sports team, Am Dram group or anything else that can be audio or video recorded then please get in touch. We’d also really appreciate your feedback on the current beta system – please send your email to email@example.com.
Note to MPs, researchers, office staff, campaigners and bloggers – we know that you want to concentrate on matching up the speeches of a particular MP, or of a particular debate. If this sounds like you, please send an email to firstname.lastname@example.org with what you want, and we’ll help you do it.
This project was initially commissioned and funded by the BBC, who asked mySociety to create a searchable, online video archive of debates based on footage from BBC Parliament. We were thrilled to help out, because we think that it will enhance the public understanding of – and respect for – the work of Parliament. The initial goal of this project was to use the BBC’s captions to help chop up the video into different speeches. Tom Loosemore arranged for access to the BBC’s internal captions data, Etienne Pollard was commissioned to build an open source recording/transcoding/web-serving system (and then donated some of his wages back to pay for enough hard drive space for the video!), Stef Magdalinski donated a network storage array to hold the disks. However, after lots of hard work trying to get our computers to automatically slice up the video into chunks according to the BBC’s captions we concluded that this on its own wasn’t sufficiently accurate to reliably match up every speech in Hansard with the appropriate snippet in our video footage.
Adversity, however, is a great source of innovation. Matthew Somerville, working on a spec first sketched out by Tom Steinberg customised the flash interface substantially so that users could watch video and help add correct timestamps. Now that’s built, what remains is for you to do your part! What’s more, once we get a significant number of speeches timestamped we can start providing web feeds and APIs for MPs to embed video footage directly on their own websites, and video of your MP’s most recent speeches on their MP page on TheyWorkForYou.
There are some conflicting views about whether this all online video of Parliament is a good idea – for instance, this video snippet (created using the new system) shows that the Deputy Leader isn’t so keen on the idea of Parliamentary footage appearing on sites like YouTube. Or perhaps she’s just been misunderstood – now you can judge for yourself what she was saying, based on her appearance and intonation. On the other hand, the BBC seem to understand the benefit of putting video content online (and they’re a fully paid up member of ParBol, the Parliamentary Broadcasting group), and Parliament themselves have an alternative set of online video streams. Unfortunately the official Parliamentary video service can’t be integrated with Hansard, is only available in Windows Media format, only has enough storage to keep the most recent 28 days of footage in archive, and doesn’t even attempt to break up the video into individual speeches apparently you can search for speeches after all, although this capability isn’t actively advertised. It perhaps goes without saying that mySociety considers it an important public service for citizens to be able to find footage of their MPs doing their work, and we will resist attempts to deny this service to citizens.
One final thing – we’re currently trying to persuade the clerks in Parliament to tweak their internal processes a bit, and make it easier for people to see how laws are made. It’s called the Free Our Bills campaign, and we need as many people as possible to join the campaign, so that we can bring law-making into the 21st century. Please sign up now!
There are already over 1000 timestamps, and we’ve not even gone for any media coverage yet. Well done all!
Update 11.00AM on Thursday 5 June 2008
6769 speeches have now been timestamped, which is exactly 20% of the current total of 33838 speeches. Thanks for all your efforts, and keep up the good work!
Could we submit the audio of the results, along with the Hansard data to VoxForge or similar, so that it can be used for training open source speech recognition algorithms?
Robert: that would be an imaginative and unexpected use of the data created by this crowd-sourcing project. Unfortunately, it wouldn’t work at all 🙁 because Hansard isn’t a verbatim record of what is said in debates, but a cleaned up version, with hesitation and repetition removed, some re-phrasing, etc.
As a written record of debates, it’s much more readable and more valuable like that, but the result is unfortunately completely useless for training a speech recognition system.
The art of cleaning up the speech of some of our less eloquent parliamentarians could be a useful learning process for a more advanced speech recognition process. Most speech recognition systems fail IMHO on the fact that they don’t deal well with the fact that we speak quite differently to how we type. Dictating to a human doesn’t present this problem. The solution could be using Hansard as a model to teach the a far more advanced dictation engine… perhaps we might one day have a speech recognition system that John Prescott could use!
A rhyme from an anonymous somewhere in Whitehall:
And so while the great ones depart to their dinner,
The secretary stays, growing thinner and thinner
Racking his brain to record and report
What he thinks that they think that they ought to have thought.
Just a quick question: will you add this to the getDebates/getHansard API? It would be fantastic to drop a video in after and excerpt if available.
An excellent idea but without available subtitles it is useless for deaf people who, as normal, are precluded yet again
You’re right that we don’t have subtitles on the video. This would be a great feature to have, but right now we don’t have the resources to implement such a feature.
It’s possible that they have subtitles available on the live TV version of the BBC parliament output, although to be honest I don’t know whether this exists. As far as I know they don’t exist on the web streaming version of BBC Parliament or the official Parliamentary video archive.
However, we do have the next best thing. If you’re watching the video alongside the Hansard transcript on TheyWorkForYou.com, then at the start of a given person’s speech the transcript of that speech is indicated with a yellow background. It’s not ideal, but we think that it’s a great improvement on what was previously available, especially for deaf people using the site.