1. Say what you’re researching on WhatDoTheyKnow!

    Have you used WhatDoTheyKnow to make a Freedom of Information request?

    If so, you can now add your photograph to the site, and some text on your user page about what you’re researching. This can include links to your blog, campaign page or twitter feed.

    To add this to your profile, first log into WhatDoTheyKnow, and go to your user page by choosing “my requests”.

    There are then links to add a profile photo and/or set some text about you, and what you’re using FOI for.

    I’d go and do it while I remember – it will help you and others find and understand each other, hopefully leading to that little bit more collaborative research!

  2. How to get TheyWorkForYou Into Your Local Paper/Radio Station in 5 minutes

    The two days leading up to election day are a hugely important time for less politically-obsessive voters. The parties know that a lot of people are only starting to seriously think how to vote today and tomorrow, and TheyWorkForYou saw its biggest spike ever the day before the election, way back in 2005.

    This means it’s a super-important time to get trustworthy, non-partisan information in front of as many people as possible. And you can help by doing the following simple things:

    1. Go to your constituency page on the TheyWorkForYou Election Quiz and take a good look at the answers. Is there anything surprising in the answers? Has anyone failed to respond who really shouldn’t? Is there anything funny in the responses? Make a couple of notes about what you think are the most interesting findings.

    2. If you know the name of your local papers or radio stations, try to Google for the email or phone number of the news desk. If you don’t know the names, try sticking the name of your nearest town into a media database like this, to get a phone number or email address.

    3. If possible, you should start your pitch by phoning rather than emailing. If you get a phone number for a news desk, give them a bell and say that you’re a volunteer from “The country’s largest non-partisan election information project”, and ask for the email of a specific person who might be interested in a story about what local candidates are saying.

    4. Once you have an email address of a specific journalist, compose a locally specific email for them, along the following lines:

    “Hi X,

    I’m a resident of Z constituency, and this election I’ve been one of 6000 volunteers helping  to build an unprecedented project to get candidates across the country to go on the record, in conjunction with the website TheyWorkForYou.com. It’s a strictly non-partisan project, aimed at giving voters a really clear, spin-free view of what their candidates stand for. I’d really appreciate it if you could give it some coverage before election day.

    In my constituency, N candidates have completed our survey. From this we can see some quite interesting things, namely:

    * Candidate A thinks…

    * Candidate B thinks…

    Would you be so kind as to print a story encouraging people to check our their candidates via TheyWorkForYou.com, and mentioning some of the highlights I’ve included?

    all the best,

    Your name, email, phone”

    5. An hour after you send the email through, give the journalist a call back to see if they need any more help.

    6. If you do this, please leave us a comment on this post so we know who’s had a go!

    Thank you for helping spread some non-partisan information this election time, and enjoy the election…

  3. It’s job interview time for your next MP!

    Thanks to the work of thousands of volunteers across the country, we’ve now launched our survey of candidates to be your MP.

    View the survey results

    It tells you the views of candidates on a range of national and local issues. What’s particular exciting is that this is individual views – we separately surveyed all the candidates.

    About 1/3rd of them have replied. The survey has a tool to let you ask the other candidates in your constituency to respond. Please give it a go, as we’d like to the survey to get as complete as possible over the weekend, to be most use to people in the last days leading up to the election.

    Competition! Have you found an inventive way to ask your candidates to respond to the survey? Maybe you doorstepped them, or sent them a cake. Post your ideas and things you’ve done in the comments below.

  4. How did we work out the survey questions?

    As you may know, TheyWorkForYou are conducting a survey of candidates for Parliament.

    Quite a few people have been asking how we worked out the questions. There are two parts to this, one local and one national.

    Local questions

    We used the power of volunteers.

    Thousands of DemocracyClub members were asked to suggest local issues in there area. These were then edited by other volunteers, to have consistent grammar, and be worded as statements to agree/disagree with, and filtered to remove national issues. The full criteria and examples are available.

    You can view the issues for any constituency on the DemocracyClub site. They are in the “local questions” tab.

    We’ve ended up with local issues for about 85% of constituencies. They’re really interesting and high quality, and quite unique for a national survey.

    Thank you to all the volunteers who helped make this happen!

    National questions

    This was hard, because we felt that asking more than 15 questions would make the survey too long. We also wanted to be sure it was non-partisan.

    We convened a panel of judges, either from mySociety/Democracy Club or with professional experience in policy, and from across the political spectrum. They were:

    • James Crabtree, chair of judges, trustee of mySociety, journalist for Prospect magazine
    • Tim Green, Democracy club developer, Physics student, Cambridge University.
    • Michael Hallsworth, senior researcher, Institute for Government.
    • Will Davies, sociologist at University of Oxford, has worked for left of centre policy think tanks such as IPPR and Demos.
    • Andrew Tucker, researcher at Birkbeck, worked for Liberal Democrats from 1996-2000.
    • Robert McIlveen, research fellow, Environment and Energy unit at Policy Exchange, did PhD on Conservative party election strategy.

    They met at the offices of the Institute for Government, and had a 3 hour judging session on 29th March 2010. They were asked to think of 8-15 questions, with multiple choice answers, which could usefully be answered both by members of the public and prospective candidates for national office.

    To ensure maximum transparency, the discussions of the judges were recorded. You can download the recordings in two parts: part 1, part 2 (2 hours, 20 mins total).

    Details of the broad framework the judges operated under are given by the chair of judges, James Crabtree, a trustee of mySociety, in the opening to the recordings.

    Please do ask any questions in the comments below.

  5. TheyWorkForYou election survey – A message for people who work for the political parties

    The following is a message that we’d like to see emailed around within political parties of all stripes. If you work for a party, or know anyone who does, please send it along:

    ———-

    Hi there,

    TheyWorkForYou.com has sent online surveys to nearly 3000 candidates across the UK, including most of your party’s candidates. If you don’t know it, TheyWorkForYou is probably the largest politician transparency website in the UK, with about 3m visitors last year.

    The survey we’ve sent is a rigorously neutral attempt to clarify candidates positions on many of the biggest issues at the election. It is also a long-term document – the data that comes from candidate responses will be viewed millions of times between now and the general election after this one. It also contains both local and national questions.

    There are 6000+ volunteers now nagging non-responsive candidates. You can help your party improve its responsiveness rating, here, by passing on the word that TheyWorkForYou’s survey is not push-polling, not single issue, not short-termist.

    Please help us by passing on the message that TheyWorkForYou will be one of the main ways that new MPs from all parties (and none) will be scrutinised and neither we nor new MPs want to start our relationship with a “refused to go on the record” badge on their pages.

    If you are a candidate, and you want to do the survey, check your email for TheyWorkForYou (no spaces). If you don’t have it, drop a mail to developers@democracyclub.org.uk and it’ll be sent along shortly.

    many thanks,

    The staff and volunteers at TheyWorkForYou and Democracy Club

  6. TheyWorkForYou’s election survey: Status Update

    In January last year, at our yearly staff and volunteers retreat, we decided that TheyWorkForYou should do something special for the general election. We decided that we wanted to gather information on where every candidate in every seat stood on what most people would think were the biggest issues, not just nationally but locally too.

    Our reasons for setting this ambitious goal were two fold. First, we thought that pinning people down to a survey that didn’t reward rhetorical flourishes would help the electorate cut through the spin that accompanies all elections. But even more important was to increase our ability to hold new MPs to account: we want users of TheyWorkForYou in the future to be able to see how Parliamentary voting records align with campaign statements.

    This meant doing quite a lot of quite difficult things:

    1. Working out who all the candidates are (thousands of them)
    2. Working out how to contact them.
    3. Gathering thousands of local issues from every corner of the country, and quality assuring them.
    4. Developing a balanced set of national issues.
    5. Sending the candidates surveys,  and chasing them up.

    The Volunteer Army

    This has turned out to be a massive operation, requiring  the creation of the independent Democracy Club set up by the amazing new volunteers Seb Bacon and Tim Green,  and an entire candidate database site YourNextMP, built by another new volunteer Edmund von der Burg.  Eventually we managed to get at least one local issue in over 80% of constituencies, aided by nearly 6000 new volunteers spread from Lands End to John O’Groats. There’s at least one volunteer in every constituency in Great Britain, and in all but three in Northern Ireland. Volunteers have done more than just submit issues, they’ve played our duck house game  to help gather thousands of email addresses, phone numbers, and postal addresses.

    The Survey

    What we ended up with is a candidate survey that is different for every constituency – 650 different surveys, in short. The survey always contains the same 15 national issues (chosen by a politically balanced panel held at the Institute for Government)  and then anything between zero and ten local issues. We’ve seen everything from cockle protection to subsidies for ferries raised – over 3000 local issues were submitted, before being painstakingly moderated, twice, by uber-volunteers checking for for spelling, grammar, obvious bias and straightforward interestingness (it isn’t really worth asking candidates if they are in favour of Good Things and against Bad Things).

    In the last couple of days we’ve started to send out the first surveys – we’ve just passed 1000 emails, and there are at least 2000 still to be sent.

    The Output

    We’re aiming to release the data we are gathering on candidates positions on 30th April. We’ll build a nice interface to explore it, but we also hope that others will do something with what we are expecting to be quite a valuable dataset.

    The Pressure

    Candidates are busy people, so how do we get their attention? Happily, some candidates are choosing to answer the survey just because TheyWorkForYou has a well know brand in the political world, but this has limits.

    The answer is that we are going to ask Democracy Club, and it’s army of volunteers to help. We’ll shortly roll out a tool that will tell volunteers which of their candidates haven’t taken the opportunity to go on the record , and provide a range of ways for them to push for their candidates to fill it in.

    It would be a lie to say we’re confident we’ll get every last candidate. But we are confident we can make sure that no candidate can claim they didn’t see, or didn’t know it was important to their constituents. And every extra voice we have makes that more likely.

    Join Democracy Club today

  7. Outlook attachments now viewable in WhatDoTheyKnow

    When a bit of government forwards or attaches emails using Outlook, they get sent using a special, strange Microsoft email format. Up until now, WhatDoTheyKnow couldn’t decode it. You’d just see a weird attachment on the response to your Freedom of Information request, and probably not be able to do anything with it.

    Peter Collingbourne got fed up with this, and luckily for us, he can code too. He forked our source code repository, and made a nice patch in his own copy of it.

    He then told us about it, and I merged his changes into the main WhatDoTheyKnow code, tested them out on my laptop, then made them live. It all work perfectly first time. Peter even added the new dependency on vpim to WhatDoTheyKnow conf/packages.

    Now if you go to an Outlook attachment on WhatDoTheyKnow,such as this one you’ll just see the files, and be able to download them, and view them as HTML as normal. They’ll also get indexed by the search (although I need to do a rebuild for that for it to work with old requests).

    Thanks Peter!

    If you want to have a go making an improvement to a mySociety site, you can get the code for most of them from our github repositories. For some sites, there’s an INSTALL.txt file explaining how to get a development environment set up. Let us know if you do anything – even incremental improvements to installation instructions are really useful. And new, useful, features like Peter’s are even more so.

  8. What are the two sorts of Cloud infrastructure called?

    I’ve been doing lots of research around “cloud computing” recently, so we can change how Mapumental works and take it out of private beta.

    One thing that’s struck me is that there doesn’t seem to be a proper, industry standard name to distinguish what to me are two fundamentally different sorts of “cloud computing”. I’m focusing here entirely on cloud services for programmers (let’s leave what it means to end users or businesses for another day).

    Here are my own names and descriptions of them:

    1) Cloud hardware server provision (Cloud HSP)
    Low level APIs for making and destroying (virtual) servers, and loading machine images onto them. e.g. Amazon Elastic Compute Cloud, Rackspace Cloud Servers, Eucalyptus’s EC2 bits. Basically, what Eucalyptus v 1.5 can do and what libcloud should do. (By analogy, this is the assembly language of cloud computing)

    2) Cloud developer service provision (Cloud DSP) A service that a developer accesses with one name and a simple API, and behind the scenes it scales for him, automatically. e.g. Amazon Queue Service, Rackspace Cloud Files. (By analogy, this layer is the C programming language of cloud computing)

    [as an aside, Google AppEngine is an interesting one. It is definitely in the Cloud DSP category, but I think it is larger than that – it is a whole set of APIs all in that category. Something like Google DataStore is a single Cloud DSP, albeit one apparently only accessible within AppEngine apps]

    It’s possible to use a Cloud HSP (assembly language), along with a bunch of your own software or open source software, to build new Cloud DSPs (C code). Right now this is pretty hard – even quite well known open source distributed datasbases like CouchDB still need scripting to even make them replicate. The code that makes and destroys servers and gives the service one name, needs manually stringing with quite new bits of wire (things like scalr and Wackamole).

    For this reason, I’m reluctant for mySociety to get into the “making our own Cloud DSP out of Cloud HSP” game. It feels to me like a suck of time, and like we wouldn’t be able to guarantee without lots of careful and expensive testing that it would scale. I’m more tempted to use the commercial Cloud DSP services where possible, even though they are proprietary. But use them via our own abstraction layer, so we can change as we need to. Of course, we have some C++ code (the public transport route finder), so will have to use the Cloud HSP API to get that going, perhaps with Amazon’s Auto Scaling. But it can jolly well use AQS and S3 to talk to other services.

    So, what do you think about the names Cloud HSP/DSP? Are there already existing names for the distinction that I’m making? Is it a useful distinction for you? Can you think of better names?

  9. WhatDoTheyKnow growing pains (and Ruby memory leaks)

    WhatDoTheyKnow keeps growing and growing, sucking people in from Google as its archive of maybe 8.5% of Freedom of Information requests gets more and more detailed.

    Graph of number of FOI requests made using WhatDoTheyKnow over time

    There’s round about 8Gb of unfettered Government data in the core database, plus a whole bunch more for indexing and caching. For comparison, TheyWorkForYou (which now goes back to 1935) has 12Gb. And it’s catching up on traffic also – WhatDoTheyKnow has about half the number of visitors as TheyWorkForYou.

    Unfortunately, this new found traffic has led to performance problems. You might have seen errors when using WhatDoTheyKnow in the last week or two. This post is firstly an apology for that. Thank you for your patience. Hopefully it is fixed now – do let us know if you get problems still. And secondly it is some techy stuff about debugging such problems in Ruby on Rails…

    When WhatDoTheyKnow started failing, we did the obvious things to start with – moving the database to a separate server, and moving some other services off the same server, to give WDTK more room to breathe. It still kept breaking.

    None of my server monitoring tools shed any very clear light as to the problem. I upgraded to the latest version of Passenger, the best Rails deployment tool I’ve seen yet. It’s pretty good, but still not mature enough for my liking. I was still getting the same problems with it, but reporting tools like passenger-memory-stats were really helpful.

    Eventually I worked out that it was to do with memory use of the Rails processes. Individual ones would leap up to 1Gb, and never drop back down. If several did, the server (with 4Gb of RAM) would start swapping and grind to a halt. The world of Ruby and Rails memory monitoring software is patchwork at best, and in the end I found the simplest tools the most useful. Here’s some:

    • I found some Rails processes were getting jammed, and not dieing even when I restarted Apache. I think in the end this was due to the Passenger spawning method, and our use of the Xapian Ruby module. Running Passenger in RailsSpawnMethod conservative mode made things much more robust.
    • Monit, which in a previous life had a job holding up vital structural pillars of buildings with duct tape, makes you feel dirty. Actually it is really useful. Given I couldn’t quickly fix the problem, Monit let me at least reduce the suffering for people trying to use the site meanwhile. Here’s the rule I used, which gives Apache a kick every time server memory use is too high. It was firing every 5 or 10 minutes…
      check system localhost
          if memory > 3500 MB then exec "/usr/sbin/apache2ctl graceful"
    • I found memory_profiler on a blog. It helps you find the kind of memory leak where you unintentionally continue to reference an object you don’t use any more. With a specialist subject of string objects. This led to a fix to do with declaring static arrays in classes vs. modules, which I still don’t really understand. But it wasn’t the cause of the big 1Gb memory munching, there were no large enough leaks of this sort.
    • The record_memory function in WDTK’s application controller came from another blog. It’s handy as it shows you how much of the system memory in the Ruby process each request causes an increase by. With caveats, this was the best way for me to identify the most damaging requests (search results, and certain public body pages). And it also brought focus on the actual problem – the peak memory use during a request. That’s really important, because Ruby’s memory manager never returns memory to the operating system… The Gb leaps in memory use were because of temporary memory used during certain requests, which the Ruby memory manager then never frees later.
    • I made a bunch of functions culminating in allocated_string_size_around_gc. This was really useful in use with the “just add lots of print statements and fiddle” school of debugging. Not everyone’s favourite school, but if your test code can’t catch it, one I often end up using (it gets really involved rarely enough that it doesn’t seem worth setting up an interactive debugger). It led me to various peak memory savings, such as calling “text.gsub!” rather than “text = text.gsub” while removing (email addresses and private information) from FOI request responses, which help quite a bit when dealing with multi-megabyte attachments.
    • Finally, I used the overlooked debugging tool, and the one you should never rely on, being common sense. That is, common sense informed by days of careful use of all the other tools. In order to quickly show text extracts when searching, WDTK stores the extracted attachment text in the database. A few of these attachments are quite large, and led to 50Mb fields, often several of which were being loaded and processed in one page request. That this would cause a high peak of memory use all became just obvious to me some time yesterday. I checked that that was the case, and this morning, I changed it to use the full text for indexing, but to at most keep 1Mb for use in snippets. So sometimes now you won’t get a good search extract for queries, but it is rare, and it will at least still return the right result.

    I’ve more work to do, I think there are quite a few other quick wins, all of which are making the site faster too. I’m quite happy that WhatDoTheyKnow also has a bunch more test code as a result of all this.

    On the other hand, what a disappointing disaster for open source languages beginning with P/R (as opposed to J). Yes, the help and tools were just about there to work it out, but would seem primitive if you’d used say Java’s Memory Analyzer. Indeed somebody over on StackOverflow suggested running your site in JRuby and using exactly that tool…

  10. How Mapumental works

    Here is a diagram of how the backend of Mapumental works. Take it in the spirit that Chris Lightfoot set when he made a similar diagram for the No. 10 petitions site – although many such diagrams are useless, hopefully this one contains useful information.

    (Click on the diagram for a large version)

    Below, I’ve explained what the main components are, and some interesting things about them.

    Everything can, at least in theory, run on lots of servers. Currently we are only actually using one server for web requests, because of problems with HAProxy. We’re runnning isodaemons on two different servers.

    Basic web application – it started out as raw Python, but the more Matthew hacks on it the more Django libraries he pulls in. Soon it’ll be indistinguishable from a Django app. When someone enters a new postcode, it adds it to the work queue in the PostgreSQL database, then refreshes waiting for the job to be finished. Then it displays the flash application (made by Stamen), set up to load the appropriate tile layers.

    Tile server and cache – This uses the Python-based TileCache, calling Geospatial Data Abstraction Library (GDAL) to help render the tiles from points. It was originally written by Stamen, and expanded by mySociety. GDAL isn’t perfect, it doesn’t have fancy enough algorithms for my liking. e.g. Using a median rather than a weighted mean.

    Isodaemons – These are controlled by a Python script, but the bulk of the code is custom written in C++. Slightly crazily, this can find the quickest route by public transport for each of 300,000 journeys from every station in the UK to a particular station, arriving at a particular time, in 10 to 30 seconds.

    I had no idea how to do this, but luckily I live in Cambridge, UK. It’s a city fit to bursting with computer scientists. Many of the jobs are dull, and need little computing, never mind science – like writing interface layers for SQL server. So if you have a real interesting problem it’s easy to get help!

    The universal advice was to use Dijkstra’s algorithm, which needed a bit of adaptation to work efficiently over space-time, rather than just space. Normally it is used for planning routes round a map, but public transport isn’t like that, you have to arrive in time for each particular train, so time affects what journeys you can take.

    I originally wrote it in Python, which was not only too slow, but used up far far too much RAM. It could never have loaded the whole dataset in. However, the old Python code is still run by the test script, to double check the C++ code against. It is also still used to make the binary timetable files, see below.

    Travel times, 1 binary file / postcode – I briefly attempted to insert 300,000 rows into PostgreSQL for each postcode looked up, but it was obvious it wasn’t going to scale. Going back to basics, it now just saves the time taken to travel to each station in a simple binary file – two bytes for each station, 600k in total. The tile server then does random access lookups into that file, as it renders each tile. It only needs to look up the values for the stations it knows are on/near the tile.

    There’s various other bits:

    • cron jobs for sending out invites
    • converting timetable data from ATCO-CIF to the binary format
    • loading static layer data into the database
    • precaching every tile for static datasets
    • Squid and Apache and FastCGI both sit in front of the web applications
    • for speed, we cache the mapping background tiles from Cloudmade
    • when zoomed out, there is code to cull which stations are used to draw tiles
    • of course, a bunch of test code

    Thanks to everyone who helped make Mapumental, we couldn’t have done it without lots of clever people.

    I realise the above is a sketchy overview, so please ask questions in the comments, and I’ll do my best to answer them.