Skip navigation

  Help us to make more
useful things.
Donate to mySociety

Would you like to work with the team that built all these sites? We’re recruiting.

mySociety blog » Developers

Job Advert: Developers – Deadline 21st Feb

Wednesday, December 8th, 2010 by Tom Steinberg

How would you like to be a coder in an organisation that is as determined to make a difference in the world as it is to be a truly high quality, engineer-led software team?

mySociety is that organisation. We’re a project of a registered charity, currently running award-winning civic and democratic websites like TheyWorkForYou.com and FixMyStreet.com, and we’re looking to grow our already-celebrated development team by several new members over the next six months.

We’re looking for people with at least two years experience (professional or keen amateur) in at least one of Python, Ruby, Perl, PHP, C++, Javascript or Adobe Flex, and who have ambitions to learn more languages in the future.

We’re looking for developers willing to commit to full or mostly-full time positions (no freelancers, sorry) and who are up for a career change that will see them stay with us for a little while. You’ll get to work with volunteers, mix commercial and charitable projects, and travel far and wide. Plus, you can work from wherever you live (in the UK), and we pay salaries from £28k to £50k depending on skills.

Most of all, we’re looking for coders who look at the services we have built so far and think “I wish I’d been on that project”. Projects you’ll likely be working on over the next few months include (but are not limited to):

  • A/B testing and conversion tracking of our charitable sites
  • Commercial spinoffs from FixMyStreet
  • Mapumental
  • Enhancements to TheyWorkForYou and WhatDoTheyKnow
  • Commercial development for clients

We’re looking to speak with possible candidates continually over the next few months, with a view to hiring two developers now and two more later in the year. Please send us your CV if you’re interested – the address is hello@mysociety.org and the subject line needs to be msjob6. The first round closing date is 10am on Monday 21st February, but CVs received after this deadline will still be considered for the next round of hiring.

And if you’ve any questions, please post them in the comments below so we can share the answers.

New features on MaPit

Wednesday, October 6th, 2010 by Matthew Somerville

We’ve added a variety of new features to our postcode and point administrative area database, MaPit, in the past month – new data (Super Output Areas and Crown dependency postcodes), new functionality (more geographic functions, council shortcuts, and JSONP callback), and most interestingly for most people, a way of browsing all the data on the site.

  • Firstly, we have some new geographic functions to join touches – overlaps, covered, covers, and coverlaps. These do as you would expect, enabling you to see the areas that overlap, cover, or are covered by a particular area, optionally restricted to particular types of area. ‘coverlaps’ returns the areas either overlapped or covered by a chosen area – this might be useful for questions such as “Tell me all the Parliamentary constituencies fully or partly within the boundary of Manchester City Council” (three of those are entirely covered by the council, and two overlap another council, Salford or Trafford).
  • As you can see from that link, nearly everything on MaPit now has an HTML representation – just stick “.html” on the end of a JSON URI to see it. This makes it very easy to explore the data contained within MaPit, linking areas together and letting you view any area on Google Maps (e.g. Rutland Council on a map). It also means every postcode has a page.
  • From a discussion on our mailing list started by Paul Waring, we discovered that the NSPD – already used by us for Northern Ireland postcodes – also contains Crown dependency postcodes (the Channel Islands and the Isle of Man) – no location information is included, but it does mean that given something that looks like a Crown dependency postcode, we can now at least tell you if it’s a valid postcode or not for those areas.
  • Next, we now have all Lower and Middle Super Output Areas in the system; thanks go to our volunteer Anna for getting the CD and writing the import script. These are provided by ONS for small area statistics after the 2001 census, and it’s great that you can now trivially look up the SOA for a postcode, or see what SOAs are within a particular ward. Two areas are in MaPit for each LSOA and MSOA – one has a less accurate boundary than the other for quicker plotting, and we thought we might as well just load it all in. The licences on the CD (Conditions of supply of SOA boundaries and Ordnance Survey Output Area Licence) talk about a click-use licence, and a not very sraightforward OS licence covering only those SOAs that might share part of a boundary with Boundary-Line (whichever ones those are), but ONS now use the Open Government Licence, Boundary-Line is included in OS OpenData, various councils have published their SOAs as open data (e.g. Warwickshire), and these areas should be publicly available under the same licences.
  • As the UK has a variety of different types of council, depending on where exactly you are, the postcode lookup now includes a shortcuts dictionary in its result, with two keys, “council” and “ward”. In one-tier areas, the values will simply by the IDs of that postcode’s council and ward (whether it’s a Metropolitan district, Unitary authority, London borough, or whatever); in two-tier areas, the values will again be dictionaries with keys “district” and “council”, pointing at the respective IDs. This should hopefully make lookups of councils easier.
  • Lastly, to enable use directly on other sites with JavaScript, MaPit now sends out an “Access-Control-Allow-Origin: *” header, and allows you to specify a JSON callback with a callback parameter (e.g. put “?callback=foo” at the end of your query to have the JSON results wrapped in a call to the foo() function). JSONP calls will always return a 200 response, to enable the JavaScript to access the contents – look for the “error” key to see if something went wrong.

Phew! I hope you find this a useful resource for getting at administrative geographic data; please do let us know of any uses you make of the site.

Outlook attachments now viewable in WhatDoTheyKnow

Monday, March 15th, 2010 by Francis Irving

When a bit of government forwards or attaches emails using Outlook, they get sent using a special, strange Microsoft email format. Up until now, WhatDoTheyKnow couldn’t decode it. You’d just see a weird attachment on the response to your Freedom of Information request, and probably not be able to do anything with it.

Peter Collingbourne got fed up with this, and luckily for us, he can code too. He forked our source code repository, and made a nice patch in his own copy of it.

He then told us about it, and I merged his changes into the main WhatDoTheyKnow code, tested them out on my laptop, then made them live. It all work perfectly first time. Peter even added the new dependency on vpim to WhatDoTheyKnow conf/packages.

Now if you go to an Outlook attachment on WhatDoTheyKnow,
such as this one you’ll just see the files, and be able to download them, and view them as HTML as normal. They’ll also get indexed by the search (although I need to do a rebuild for that for it to work with old requests).

Thanks Peter!

If you want to have a go making an improvement to a mySociety site, you can get the code for most of them from our github repositories. For some sites, there’s an INSTALL.txt file explaining how to get a development environment set up. Let us know if you do anything – even incremental improvements to installation instructions are really useful. And new, useful, features like Peter’s are even more so.

What are the two sorts of Cloud infrastructure called?

Tuesday, September 22nd, 2009 by Francis Irving

I’ve been doing lots of research around “cloud computing” recently, so we can change how Mapumental works and take it out of private beta.

One thing that’s struck me is that there doesn’t seem to be a proper, industry standard name to distinguish what to me are two fundamentally different sorts of “cloud computing”. I’m focusing here entirely on cloud services for programmers (let’s leave what it means to end users or businesses for another day).

Here are my own names and descriptions of them:

1) Cloud hardware server provision (Cloud HSP)
Low level APIs for making and destroying (virtual) servers, and loading machine images onto them. e.g. Amazon Elastic Compute Cloud, Rackspace Cloud Servers, Eucalyptus’s EC2 bits. Basically, what Eucalyptus v 1.5 can do and what libcloud should do. (By analogy, this is the assembly language of cloud computing)

2) Cloud developer service provision (Cloud DSP) A service that a developer accesses with one name and a simple API, and behind the scenes it scales for him, automatically. e.g. Amazon Queue Service, Rackspace Cloud Files. (By analogy, this layer is the C programming language of cloud computing)

[as an aside, Google AppEngine is an interesting one. It is definitely in the Cloud DSP category, but I think it is larger than that - it is a whole set of APIs all in that category. Something like Google DataStore is a single Cloud DSP, albeit one apparently only accessible within AppEngine apps]

It’s possible to use a Cloud HSP (assembly language), along with a bunch of your own software or open source software, to build new Cloud DSPs (C code). Right now this is pretty hard – even quite well known open source distributed datasbases like CouchDB still need scripting to even make them replicate. The code that makes and destroys servers and gives the service one name, needs manually stringing with quite new bits of wire (things like scalr and Wackamole).

For this reason, I’m reluctant for mySociety to get into the “making our own Cloud DSP out of Cloud HSP” game. It feels to me like a suck of time, and like we wouldn’t be able to guarantee without lots of careful and expensive testing that it would scale. I’m more tempted to use the commercial Cloud DSP services where possible, even though they are proprietary. But use them via our own abstraction layer, so we can change as we need to. Of course, we have some C++ code (the public transport route finder), so will have to use the Cloud HSP API to get that going, perhaps with Amazon’s Auto Scaling. But it can jolly well use AQS and S3 to talk to other services.

So, what do you think about the names Cloud HSP/DSP? Are there already existing names for the distinction that I’m making? Is it a useful distinction for you? Can you think of better names?

WhatDoTheyKnow growing pains (and Ruby memory leaks)

Thursday, September 17th, 2009 by Francis Irving

WhatDoTheyKnow keeps growing and growing, sucking people in from Google as its archive of maybe 8.5% of Freedom of Information requests gets more and more detailed.

mapumental-early-architecture
(Graph of number of FOI requests made using WhatDoTheyKnow over time; click for larger version)

There’s round about 8Gb of unfettered Government data in the core database, plus a whole bunch more for indexing and caching. For comparison, TheyWorkForYou (which now goes back to 1935) has 12Gb. And it’s catching up on traffic also – WhatDoTheyKnow has about half the number of visitors as TheyWorkForYou.

Unfortunately, this new found traffic has led to performance problems. You might have seen errors when using WhatDoTheyKnow in the last week or two. This post is firstly an apology for that. Thank you for your patience. Hopefully it is fixed now – do let us know if you get problems still. And secondly it is some techy stuff about debugging such problems in Ruby on Rails…

When WhatDoTheyKnow started failing, we did the obvious things to start with – moving the database to a separate server, and moving some other services off the same server, to give WDTK more room to breathe. It still kept breaking.

None of my server monitoring tools shed any very clear light as to the problem. I upgraded to the latest version of Passenger, the best Rails deployment tool I’ve seen yet. It’s pretty good, but still not mature enough for my liking. I was still getting the same problems with it, but reporting tools like passenger-memory-stats were really helpful.

Eventually I worked out that it was to do with memory use of the Rails processes. Individual ones would leap up to 1Gb, and never drop back down. If several did, the server (with 4Gb of RAM) would start swapping and grind to a halt. The world of Ruby and Rails memory monitoring software is patchwork at best, and in the end I found the simplest tools the most useful. Here’s some:

  • I found some Rails processes were getting jammed, and not dieing even when I restarted Apache. I think in the end this was due to the Passenger spawning method, and our use of the Xapian Ruby module. Running Passenger in RailsSpawnMethod conservative mode made things much more robust.
  • Monit, which in a previous life had a job holding up vital structural pillars of buildings with duct tape, makes you feel dirty. Actually it is really useful. Given I couldn’t quickly fix the problem, Monit let me at least reduce the suffering for people trying to use the site meanwhile. Here’s the rule I used, which gives Apache a kick every time server memory use is too high. It was firing every 5 or 10 minutes…
    check system localhost
        if memory > 3500 MB then exec "/usr/sbin/apache2ctl graceful"
  • I found memory_profiler on a blog. It helps you find the kind of memory leak where you unintentionally continue to reference an object you don’t use any more. With a specialist subject of string objects. This led to a fix to do with declaring static arrays in classes vs. modules, which I still don’t really understand. But it wasn’t the cause of the big 1Gb memory munching, there were no large enough leaks of this sort.
  • The record_memory function in WDTK’s application controller came from another blog. It’s handy as it shows you how much of the system memory in the Ruby process each request causes an increase by. With caveats, this was the best way for me to identify the most damaging requests (search results, and certain public body pages). And it also brought focus on the actual problem – the peak memory use during a request. That’s really important, because Ruby’s memory manager never returns memory to the operating system… The Gb leaps in memory use were because of temporary memory used during certain requests, which the Ruby memory manager then never frees later.
  • I made a bunch of functions culminating in allocated_string_size_around_gc. This was really useful in use with the “just add lots of print statements and fiddle” school of debugging. Not everyone’s favourite school, but if your test code can’t catch it, one I often end up using (it gets really involved rarely enough that it doesn’t seem worth setting up an interactive debugger). It led me to various peak memory savings, such as calling “text.gsub!” rather than “text = text.gsub” while removing (email addresses and private information) from FOI request responses, which help quite a bit when dealing with multi-megabyte attachments.
  • Finally, I used the overlooked debugging tool, and the one you should never rely on, being common sense. That is, common sense informed by days of careful use of all the other tools. In order to quickly show text extracts when searching, WDTK stores the extracted attachment text in the database. A few of these attachments are quite large, and led to 50Mb fields, often several of which were being loaded and processed in one page request. That this would cause a high peak of memory use all became just obvious to me some time yesterday. I checked that that was the case, and this morning, I changed it to use the full text for indexing, but to at most keep 1Mb for use in snippets. So sometimes now you won’t get a good search extract for queries, but it is rare, and it will at least still return the right result.

I’ve more work to do, I think there are quite a few other quick wins, all of which are making the site faster too. I’m quite happy that WhatDoTheyKnow also has a bunch more test code as a result of all this.

On the other hand, what a disappointing disaster for open source languages beginning with P/R (as opposed to J). Yes, the help and tools were just about there to work it out, but would seem primitive if you’d used say Java’s Memory Analyzer. Indeed somebody over on StackOverflow suggested running your site in JRuby and using exactly that tool…

How Mapumental works

Tuesday, August 18th, 2009 by Francis Irving

Here is a diagram of how the backend of Mapumental works. Take it in the spirit that Chris Lightfoot set when he made a similar diagram for the No. 10 petitions site – although many such diagrams are useless, hopefully this one contains useful information.

If you haven’t seen Mapumental yet, first take a look at the video, and sign up for the private beta.

mapumental-early-architecture
(Click on the diagram for a large version)

Below, I’ve explained what the main components are, and some interesting things about them.

Everything can, at least in theory, run on lots of servers. Currently we are only actually using one server for web requests, because of problems with HAProxy. We’re runnning isodaemons on two different servers.

Basic web application – it started out as raw Python, but the more Matthew hacks on it the more Django libraries he pulls in. Soon it’ll be indistinguishable from a Django app. When someone enters a new postcode, it adds it to the work queue in the PostgreSQL database, then refreshes waiting for the job to be finished. Then it displays the flash application (made by Stamen), set up to load the appropriate tile layers.

Tile server and cache – This uses the Python-based TileCache, calling Geospatial Data Abstraction Library (GDAL) to help render the tiles from points. It was originally written by Stamen, and expanded by mySociety. GDAL isn’t perfect, it doesn’t have fancy enough algorithms for my liking. e.g. Using a median rather than a weighted mean.

Isodaemons – These are controlled by a Python script, but the bulk of the code is custom written in C++. Slightly crazily, this can find the quickest route by public transport for each of 300,000 journeys from every station in the UK to a particular station, arriving at a particular time, in 10 to 30 seconds.

I had no idea how to do this, but luckily I live in Cambridge, UK. It’s a city fit to bursting with computer scientists. Many of the jobs are dull, and need little computing, never mind science – like writing interface layers for SQL server. So if you have a real interesting problem it’s easy to get help!

The universal advice was to use Dijkstra’s algorithm, which needed a bit of adaptation to work efficiently over space-time, rather than just space. Normally it is used for planning routes round a map, but public transport isn’t like that, you have to arrive in time for each particular train, so time affects what journeys you can take.

I originally wrote it in Python, which was not only too slow, but used up far far too much RAM. It could never have loaded the whole dataset in. However, the old Python code is still run by the test script, to double check the C++ code against. It is also still used to make the binary timetable files, see below.

Travel times, 1 binary file / postcode – I briefly attempted to insert 300,000 rows into PostgreSQL for each postcode looked up, but it was obvious it wasn’t going to scale. Going back to basics, it now just saves the time taken to travel to each station in a simple binary file – two bytes for each station, 600k in total. The tile server then does random access lookups into that file, as it renders each tile. It only needs to look up the values for the stations it knows are on/near the tile.

There’s various other bits:

Thanks to everyone who helped make Mapumental, we couldn’t have done it without lots of clever people.

I realise the above is a sketchy overview, so please ask questions in the comments, and I’ll do my best to answer them.

RIP Angie Martin 1974-2009

Monday, July 20th, 2009 by Tom Steinberg

It is with overwhelming sadness that I write to tell our community that Angie Martin, mySociety’s fourth core developer, has died. She was taken from us by the cancer that she had been fighting since soon after we hired her less than two years ago.

Possessed of an almost unbelievably upbeat personality, Angie brought not only her formidable Perl skills, but her blazing warmth of character to our team. In remission during our yearly retreat in January this year, she combined laughter with a typically tough line of questioning on ideas she thought insufficiently robust. With typical disgregard for cool, her CV noted that she was “known to enjoy wrangling regular expressions on a Sunday Morning”. She didn’t see any contradiction between being a successful woman and a geek, throwing herself wholeheartedly into the Mac-toting, perlmonger ethos. She even brought her husband Tommy with her, who became a significant volunteer.

Given her habit of plain speaking, it is pointless to pretend that Angie was able to make the contribution to mySociety’s users or codebase that she wanted to. What she achieved in terms of difficult coding during recovery from chemotherapy was incredible, breathtaking – but she wanted to change the world. It now falls to the rest of us, and our supporters, to live up to the expectations she embodied, to continue to push every day, using skills like those that she had to help people with everyday problems. We now have to ask ‘What would Angie do?’, as well as ‘What would Chris do?’. It is a lot to live up to.

She was a mySociety core developer: I hope that meant as much to her as it meant for me to have her as one of my coders.  Remember and Respect.

Updated: Angie changed her surname upon getting married, a couple of months ago. I have just read she wanted to be remembered as Angie Martin, and so I have made that change. Read this tribute on the Lasso list.

Updated 21 7 2009: Tommy has just told me that those wishing to may memorial donations should send them to Hospice at Home.

“Politicians are using the internet to harness your bright ideas”

Sunday, September 7th, 2008 by Francis Irving

“People often say they could run Britain better than the political parties. A web-based revolution may give them the chance”

Nice article in the Sunday Times today mentioning lots of our sites and others.

Annotations just in today…

Monday, September 1st, 2008 by Francis Irving

It’s the first full working day for the new facility to annotate Freedom of Information (FOI) requests on WhatDoTheyKnow, and people have been hard at it.

Mr Ormerod points out that private information isn’t necessarily so private if someone has died, so perhaps the exemption the MOD used shouldn’t apply.

Trevor R Nunn has posted three annotations (e.g. this one) to show that his three FOI requests are being treated as one. The annotations facility is great for handling edge cases like this, which don’t happen often enough to be worth explicitly adding to the code, but need some mention.

And finally Edward Betts has processed the list of post boxes retrieved by FOI into a more structured data format, and posted up a link to it. Exactly the kind of collaboration I love to see!

And that’s just this morning!

acts_as_xapian

Thursday, July 17th, 2008 by Francis Irving

One of the special pieces of magic in TheyWorkForYou is its email alerts, sending you mail whenever an MP says a word you care about in Parliament. Lots of sites these days have RSS, and lots have search, but surprisingly few offer search based email alerts. My Mum trades shares on the Internet, setting it to automatically buy and sell at threshold values. But she doesn’t have an RSS reader. So, it’s important to have email alerts.

So naturally, when we made WhatDoTheyKnow, search and search based email alerts were pretty high up the list, to help people find new, interesting Freedom of Information requests. To implement this, I started out using acts_as_solr, which is a Ruby on Rails plugin for Solr, which is a REST based layer on top of the search engine Lucene.

I found acts_as_solr all just that bit too complicated. Particularly, when a feature (such as spelling correction) was missing, there were too many layers and too much XML for me to work out how to fix it. And I had lots of nasty code to make indexing offline – something I needed, as I want to safely store emails when they arrive, but then do the risky indexing of PDFs and Word documents later.

The last straw was when I found that acts_as_solr didn’t have collapsing (analogous to GROUP BY in SQL). So I decided to bite the bullet and implement my own acts_as_xapian. Luckily there were already Xapian Ruby bindings, and also the fabulous Xapian email list to help me out, and it only took a day or two to write it and deploy it on the live site.

If you’re using Rails and need full text search, I recommend you have a look at acts_as_xapian. It’s easy to use, and has a diverse set of features. You can watch a video of me talking about WhatDoTheyKnow and acts_as_xapian at the London Ruby User Group, last Monday.

Internal links, and search engine crawlers

Thursday, July 17th, 2008 by Matthew Somerville

TheyWorkForYou now finds whenever an old version of Hansard is referenced (which they do by date and column number, e.g. Official Report, 29 February 2008, column 1425) and turns the citation into a link to a search for the speeches in that column on that date. This only really became feasible when we moved server, upgraded Xapian, and added date and column number metadata (among others), allowing much more advanced and focussed searching – the advanced search form gives some ideas. Perhaps in future we’ll be able to add some crowd-sourcing game to match the reference to the exact speech, much like our video matching (nearly 80% of our archive done!). :)

Kudos to Google and Yahoo! for spotting this change within a couple of days, as they’re now so busy crawling everything for changes that they’re slowing the whole website down… ;-)

Postcodes on TheyWorkForYou

Tuesday, July 8th, 2008 by Matthew Somerville

If you enter your postcode on TheyWorkForYou and it’s Scottish or Northern Irish, you’re now presented with your MSPs and MLAs as well as your MP, which makes sense given the site covers their Parliament and Assembly respectively. :-) You also get an extra tab in the navigation linking through to Your MSPs or MLAs. In order to do this, I needed a quick way of determining if a postcode was Northern Irish or Scottish. Northern Ireland was easy, as all postcodes there begin with BT. I assumed Scotland was also easy, which turned out to be true apart from the TD postcode area that straddled the border like a mail-sorting Niagara Falls. After some very dull investigation, I eventually worked out that e.g. most of TD15 is in England, but (amongst others) TD15 1X* is in Scotland, except for TD15 1XX which is apparently back in England. The final result was the postcode_is_scottish() function in postcode.inc, which (hopefully) correctly determines if a given postcode is Scottish or not – perhaps someone else will find it useful.

Highlighting the current speech

Friday, June 13th, 2008 by Matthew Somerville

Debate pages that have at least one timestamped speech (such as the previously mentioned last week’s Prime Minister’s Questions) have a video fixed to the bottom right hand corner (if your browser is recent enough) showing that debate. While playing the video, the currently playing speech is highlighted with a yellow background, and you can start watching from any timestamped speech by clicking the “Watch this” link by any such speech. So how does all that work?

I’m very proud of this feature, I wasn’t sure it would be possible, and it’s very exciting. :-)

Flash has an ExternalInterface API, where JavaScript can call functions in the Flash, and vice-versa. When the video player loads, it requests an XML list from the server of all speech GIDs and timestamps for the current debate (here’s the file for the above debate). So when someone clicks a “Watch this”, it calls a moveVideo function in main.mxml with the GID of the speech, which loops through all the speeches and moves to the correct point if possible.

The highlighting works the other way – as the video is playing, it checks to see which speech we’re currently in, and if there’s been a change, it calls the updateSpeech function in TheyWorkForYou’s JavaScript, which finds the right row in the HTML and changes the class in order to highlight it. Quite straightforward, really, but it does make following the debate very simple and highlights the linking between the video and the text, all done by our excellent volunteers (join in! :) ).

Talking of our busy timestampers, I’ve also been busy making improvements (and fixing bugs) to the timestamping interface to make things easier for them. As well as warnings when it looks like two people are timestamping the same debate at the same time, various invisible things have been changed, such as using other people’s timestamps to make the start point for future timestamps on the same day more accurate. I also added a totaliser, using the Google Chart API, for which you simply have to provide image size and percentage complete.

Approaching 45% of our entire archive of video timestamped, with the totaliser approaching the chartreuse :-)

Previous articles

  1. The Flash player
  2. Seeking
  3. Highlighting the current speech

TheyWorkForYou video – seeking

Friday, June 13th, 2008 by Matthew Somerville

Our video is streamed via progressive HTTP, using lighttpd and mod_flv_streaming. This works by having keyframe metadata at the start of the FLV (Flash video) file (we add ours using yamdi as that doesn’t load the whole file into memory first), which maps times within the video to byte positions within the file. When someone drags the position slider, or presses a skip button, the player actually changes the source of the video to something like file.flv?start=<byte position> which starts a new download from that point in the video. This means you can seek to parts of the video not yet downloaded, which is definitely a required feature.

The video is split up into programme chunks, according to BBC Parliament’s schedule, so each Oral Questions will (approximately) be its own video chunk, and the main debates will be a couple of chunks. By default, the video player will show a screengrab from the start of the video, as that’s all that’s available when it first loads (you have to load the start of the FLV file to fetch the keyframe metadata in order to move anywhere else :) ). I wanted the player to show a relevant screengrab before you hit Play, so came up with the slightly messy workaround of setting the volume to 0, seeking and playing the video for under a second in order to start it from the new point and show the video, then stopping it and resetting the volume. It works most of the time :-)

Some of our video chunks have jumps in them, due to problems in downloading the original WMV stream. The timestamping interface has a link for people to let us know of such problems, so that we can mark the relevant speeches as missing video and not have them be offered to future timestampers. One valiant volunteer, Tim, let us know about two such videos, but with the added oddity that if you let them play, they would happily carry on past their “end” point, but this made timestamping those speeches quite difficult.

I started investigating, firstly noting that both videos should have been 6 hours long, but were both listed as 1:20:24, which I thought was a bit of an odd coincidence. After reading the FLV file specification, it turned out that 32-bit millisecond timestamps in FLV are split into two – first the low 24 bits, then the high 8 bits. 2^24 = 16,777,216, which in milliseconds is 4 hours, 39 minutes, 37 seconds, which is pretty much exactly what the two videos’ durations were short by! All the timestamps in our FLV files were not setting the high byte, so after 4:39:37, they were wrapping round to 0 (and thus 6 hours became 1:20:24ish).

Our video processing consists of four major steps – the downloading script uses ffmpeg to convert each 75 minute chunk from WMV to MPEG; then nightly processing uses ffmpeg again to convert the right bits of these MPEG files to FLV, mencoder to join the relevant FLV files into one FLV chunk, then yamdi to add the metadata. My first try at a solution was to alter yamdi to increment the high byte itself, which fixed the duration display and let you seek to high times, but when you tried to go to e.g. 5 hours, the video started playing from the right point but the video thought it was playing from 20 minutes in. This would obviously confuse timestamping!

As the FLV files produced by ffmpeg were all under 75 minutes long, they couldn’t have the problem. It turned out we were running an old version of mencoder, and updating that and converting all our long video files fixed the problem. Phew :-)

Join us later today for my third short technical talk on TheyWorkForYou video, where I’ll explain how our Flash application talks to the HTML and vice-versa to enable the “Watch this” and highlighting of speeches.

  1. The Flash player
  2. Seeking
  3. Highlighting the current speech

TheyWorkForYou video – the Flash player

Thursday, June 12th, 2008 by Matthew Somerville

TheyWorkForYou video timestamping has been launched, over 40% of available speeches have already been timestamped, and (hopefully) all major bugs have been fixed, so I can now take a short breather and write this short series of more technical posts, looking at how the front end bits I wrote work and hang together.

Let’s start with the most obvious feature of video timestamping – the video player itself. :) mySociety is an open-source shop, so it was great to discover that (nearly all of) Adobe Flex is available under the Mozilla Public Licence. This meant I could simply download the compiler and libraries, write some code and compile it into a working SWF Flash file without any worries (and you can do the same!).

Writing a Flex program is split into three main areas – MXML that lays out your application, defines any web services you’re using and so on; CSS to define the style of the various components; and ActionScript to deal with things like events, or talking to the JavaScript in the parent HTML. My code is probably quite shoddy in a number of places – it’s my first application in Flex :-) – but it’s all available to view if you want to take a peek, and it’s obviously running on the live TheyWorkForYou site.

To put a video component in the player is no harder than including an <mx:VideoDisplay> element – set the source of that, and you have yourself a video player, no worrying about stream type, bandwidth detection, or anything else. :) You can then use a very useful feature called data binding to make lots of things trivial – for example, I simply set the value of a horizontal slider to be the current playing time of the video, and the slider is then automatically in the right place at all times. On the downside, VideoDisplay does appear to have a number of minor bugs (the most obvious one being where seeking can cause the video to become unresponsive and you have to refresh the page; it’s more than possible it’s a bug in my code, of course, but there are a couple of related bugs in Adobe’s bug tracker).

As well as the buttons, sliders and the video itself, the current MXML contains two fades (one to fade in the hover controls, one to fade them out), one time formatter (to format the display of the running time and duration), and three web services (to submit a timestamp result, delete a mistaken timestamp, and fetch an array of all existing timestamps for the current debate). These are all called from various places within the ActionScript when certain events happen (e.g. the Now button or the Oops button is clicked).

Compiling is a simple matter of running mxmlc on the mxml file, and out pops a SWF file. It’s all straightforward, although a bit awkward at first working again with a strongly-typed, compiled language after a long time with less strict ones :-) The documentation is good, but it can be hard to find – googling for [flex3 VideoDisplay] and the like has been quite common over the past few weeks.

Tomorrow I will talk about moving around within the videos and some bugs thrown up there, and then how the front end communicates with the video in order to highlight the currently playing speech – for example, have a look at last week’s Prime Minister’s Questions.

  1. The Flash player
  2. Seeking
  3. Highlighting the current speech

Excellent new consultations site from Harry Metcalfe

Friday, February 22nd, 2008 by Tom Steinberg

I only met Harry Metcalfe a few weeks ago, when he was volunteering for the Open Rights Groups.

Since then he’s dazzled us with his completely single handed production of TellThemWhatYouThink, a site which draws most central government consultations into one site. The main citizen benefits from this is that you can get email or RSS alerts when a department decides to hold a consultation on an issue that you care about. As per usual it’s also starting with horrid nonstandard data in a zillion formats and turning it into nice structured data for everyone else to play with too.

It’s not a mySociety project, he’s just using our friends and family server, but he’s made it look more like a mySociety site than even we could manage. Kudos Harry!

PS Harry, along with both of the last two new volunteers to do major pieces of coding for mySociety, is in the third year of his PhD. I think I can see a pattern emerging…

He was a man, take him for all in all

Tuesday, February 12th, 2008 by Tom Steinberg

Chris Lightoot died a year ago today (or yesterday, by a few minutes).

I’m just sitting here reading the very first emails I ever got from him, back in 2003. Within the first few mails he’s invented and hacked up the idea that is now Richard Pope’s PlanningAlerts.com, coded and developed the idea that persuaded YouGov to donate vast amounts of free polling data to form PoliticalSurvey2005.com (a wider understanding of which would greatly help in the US election if the methodology was only applied there) and in this post he’s foreseen the Google maps mashup craze and offered it on a plate to the Ordnance Survey to pioneer, two years before Google started.

The invention and brilliance comes so thick and fast reading these mails that I now realise that I’d persuaded myself over the year that I’d mis-remembered quite how insanely creative he was, trying to correct for rose-tinted lenses. But he was a proper, bona fide, no-holds-barred cantankerous genius. Most days I think about Chris at least once: I try to make sure we live up to his standards (he wouldn’t have tolerated my use of ‘But’ at the start of the last sentence, for example). Reading these mails tonight drives home the scale of what we all lost, amongst our friends, on the Internet and in society at large. It aches to contemplate.

New mySociety Travel Time Maps are Pretty and Powerful

Thursday, January 24th, 2008 by Tom Steinberg

You may remember that back in 2006 mySociety published some maps showing how long it took to commute places via public transport.

We’ve just made some more which have some lovely new features we reckon you’ll probably like a lot.

If you’d like to see more maps like this in your area, please ask your local transport authority to get in touch with us, or nudge these people :)

PS As always, Francis Irving remains a genius.

Interview with Romanian eDemocracy site builder Adrian Moraru

Thursday, October 4th, 2007 by Tom Steinberg

This is the second in a short series of interviews with people building and running some of the most exciting internet and democracy projects in Europe.

Adrian Moraru from the IPP in Romania set the BerlinInAugust unconference abuzz with occasional gasps at the uncompromising relentlessness of their approach, which included suing to obtain the mobile phone numbers of all the politicians with handsets provided on the public purse. Below you can hopefully see why they got people excited…

——-

What is the organisation you work for?

Our organization, the Institute for Public Policy, is an independent think tank based in Bucharest. We have a permanent staff of 12 people plus a pool of external experts and part time collaborators that we work with on project based relationship. This external group may number as many as 50 in a year and range from former public officials, to politicians, independent experts, journalists, students, young researchers and academics. We work in numerous areas but we specialise in local government, parliament and the ministries.

What is the main purpose of the site(s) that you run?

The main purpose is to give people with a specialised, professional interest in politics an easy way to access facts and statistics about the way MPs are working & voting, as well providing information for the general public.

Can you tell us about some of the unusual ways you ensure that your vote attendance information is accurate?

Sure, it’s easy. In our parliament the attendance is recorded based on a attendance register at the entrance of the plenary hall. However, it is common for some MPs to sign on behalf of their colleagues and/or friends. So in order to expose the size of this phenomenon we decided to keep track of MP’s real attendance in a more accurate way.

Some politicians have legitimate exemptions, which we record, be we also wanted an accurate record of how many of them are present when votes happen. So lets say you have 20 votes in a day. If the name of the MP Mr. X shows up only in 14 of them then he is present only 70%. Furthermore, if, say, only 204 voted out of a possible 322, we deduce from our database the 118 who didn’t show up, and add that to their record.

We have used video cameras from time to time in order to combat the practice of multiple voting. This is happens because of our voting system in Parliament is based on electronic voting stations placed on your bench were MPs identify themselves with a smart card (aka voting card) before pushing a button corresponding with their voting choice.

Politicians have 10 seconds to do so once the vote is initiated. Some MPs use these 10 seconds to vote once with their own cards and then once with the cards of colleagues who are, for example, out at lunch. This is a widespread practice.

We have the plenary sessions broadcasted live and also available recorded on
the Parliament website
. We suggested that the during voting that a camera record the activity int he whole chamber. We therefore exposed a few cases of this multiple voting, although not much has happened as a result yet.

You also collect information about politician’s travel. How did you get that? What does it tell your users?

We get it through our freedom of information laws. But is not that easy to get hold of. Sometimes we even have to go on court to get it, and sometimes even when we do it comes on paper, not in electronic formats, which is obviously harder to re-use.

What it shows is where an MP went, when, why, how much it cost, how long they stayed and so on. From this we can help people establish whether they think it was strictly necessary for an MP to visit French Guineau to see the launching of an Ariane V rocket, and we can provide the most popular country destination by political parties.

Do you ever face claims that the effects you have on politicians aren’t entirely positive? If so, how do you respond?

Well this is not a consolidated democracy, you know. MPs are not as nice as yours. So, yes large parts of the databases hurts a lot of them a great deal. Let’s just say we are not scared. But on the other hand we strive to get the best data and to present it in a non aggressive, non biased way using the best algorithms.

You are good at using the law to obtain information. Can you tell us a bit about your approach, and what information you’ve obtained through the courts?

This is a very distinct topic. We always ask for information via our Freedom of Information act, using a special format of letter which cannot be completely ignored. We have lawyers following the flow of requests together with an office manager and we sue every time we do not get an answer, have our request denied or find that information we’ve been provided with is incomplete.

We ask for a lot. A lot! Usually we fight for data that exposes bad practices and most of the things involving expenses or money. It is here where there is a lot to hurt bad politicians by exposing how unwisely some of them are spending the money.

What other projects around the world excite you the most, and why?

Tough one. None. I like opensecrets.org and votesmart.org but that’s it. I do not believe in moving participatory democracy online in our life time. Instead I think we should be looking for ways to open up government and make it more transparent using the internet. In my opinion we are not even at 10% of the way to what we can ultimately do, either in Romania or elsewhere. We can think also about real interactivity in the future.

What’s next for you and IPP on the Internet?

Who knows?

That’s it for the moment. Please post any questions for Adrian in the comments below, and I’ll see if I can update this accordingly.

mySociety Disruptive Technology Talks

Monday, September 24th, 2007 by Tom Steinberg

At mySociety we’re always very lucky to meet and spend time with some extremely diverse and impressive people.

We thought it would be great to share a bit of that good fortune by holding some talks from some of our favourite thinkers, and to have an excuse to meet more people in the wider mySociety community face to face.

To that end, we’re holding four talks in London this autumn (location TBD but almost certainly a centralish pub). Each link below goes to an Upcoming page where you can sign up to let us keep track of numbers and how big a venue we need.

4/10/2007 – Stefan Magdalinski, net-political troublemaker extraordinaire

1/11/2007 – Steve Coast, founder of Open Street Map

CANCELLED 29/11/2007 – Jason Kitcat, e-voting expert

12/12/2007 – Peter Wainman, IT-specialist solicitor and blogger

We look foward to seeing you there.


News & information:
Projects:
Keep in touch:
Technical:

mySociety is a project of UK Citizens Online Democracy (UKCOD). UKCOD is a registered charity in England and Wales, no. 1076346. Its company number is 03277032, and mySociety Ltd's is 05798215.