1. Helping voters in the devolved elections

    As well as council elections and the referendum, the Scottish Parliament, Welsh Assembly, and Northern Ireland Assembly are holding elections this May. In Scotland and Northern Ireland, there are accompanying boundary changes, meaning this year you might be voting in a different constituency from last time.

    To help people, as we’ve again had a few requests, our service from the 2010 general election is back, at http://www.theyworkforyou.com/boundaries/, just for the Scottish Parliament and Northern Ireland Assembly. Our generic lookup service MaPit also provides programmatic access to these results (technical footnote).

    Alongside this service, we have refreshed our Scotland and Northern Ireland front pages, to slightly better display and access the wide array of information TheyWorkForYou holds for those devolved legislatures.

    Sadly the Scottish Parliament changed the format of their Official Report in mid January and we haven’t been able to parse the debates from then until its dissolution this March – hopefully we’ll be able to fix that at some point, and apologies for the inconvenience in the meantime.

    There don’t appear to be any central official lists of candidates in these elections. Amnesty.org.uk has a PDF of all candidates in Northern Ireland; David Boothroyd has a list of Scottish Parliament candidates. CAMRA appears to have lists for both Scotland and Wales. Those were simply found while searching for candidate lists, we obviously hold no position on those organisations 🙂

    Technical footnote: To look up the new Scottish Parliament boundaries using MaPit, provide a URL query parameter of “generation=15” to the postcode lookup call. The Northern Ireland Assembly boundaries are aligning with the Parliamentary boundaries, so you can just perform a normal lookup and use the “WMC” result for the new boundary.

  2. Why I’d like mySociety to run a Masters in Public Technology

    WWII students looking at an Engine

    It is a clichĂ© for any manager to say that they are proud of their team, and mildly nausea-inducing to listen to anyone who goes on about it too long. However, the purpose of this post is to argue that the world would benefit from a new kind of post-graduate Masters programme – something that is hard to do without  describing the virtues of the type of people who should come out of it. So please bear with me, and keep a sick bag to one hand.

    mySociety’s core development team is very, very good. But they’re not just good at turning out code. Louise Crow, for example, has a keen eye for things that will and won’t make a difference in the offline world, as well as the skills to build virtually whatever she can think of. And the exact same thing is true of the whole coding team:  Duncan, Matthew, Edmund and Dave in the current team, plus Francis, Chris and Angie before them.

    mySociety didn’t give these people their raw talent, nor the passion to be involved with projects that make a difference.   What it has given them, though, is the chance to spend a lot of time talking to each other, learning from their triumphs and their mistakes, and listening to users. This space and peer-contact made them into some of the world’s few genuine experts in the business of conceptualising and then delivering digital projects that deliver new kinds of civic and democratic benefits.

    So, why am I sitting here unashamedly blowing my colleagues trumpets like this? (I don’t have these skills, after all!) Well, in order to point out that there are quite simply far too few people like this out there.

    Too few experts

    “Too few for what?” you may well ask. Too few for any country that wants to be a really great place to live in the 21st century, is my answer.

    There is barely a not-for-profit, social enterprise or government body I can think of that wouldn’t benefit from a Duncan Parkes or a Matthew Somerville on the payroll, so long as they had the intelligence and self-discipline not to park them in the server room. Why? Because just one person with the skills, motivation and time spent learning can materially increase the amount of time that technology makes a positive contribution to almost any public or not-for-profit organisation.

    What they can do for an organistion

    Such people can tell the management which waves of technology are hype, and which bring real value, because they care more about results than this week’s craze, or a flashy presentation. They can build small or medium sized solutions to an organisation’s problems with their bare hands, because they’re software engineers. They can contract for larger IT solutions without getting ripped off or sold snake oil. And they can tell the top management of organisations how those organisations look to a digital native population, because they come from that world themselves.

    And why they don’t

    Except such experts can’t do any of these things for not-for-profit or public institutions: they can’t help because they’re not currently being employed by such bodies. There are two reasons why not, reasons which just may remind you of a chicken and an egg.

    First, such institutions don’t hire this kind of expert because they don’t know what they are missing – they’re completely outside of the known frame of reference. Before you get too snarky about dumb, insular institutions, can you honestly say you would try to phone a plumber if you had never heard that they existed? Or would you just treat the water pouring through the ceiling as normal?

    Second, these institutions don’t hire such experts because there just aren’t enough on the market: mySociety is basically the main fostering ground in UK for new ones, and we greedily keep hold of as many of our people as possible. Hands off my Dave!

    Which leads me to the proposal, a proposal to create more such experts for public and non-profit institutions, and to make me feel less guilty about mySociety hoarding the talent that does exist.

    Describing the Masters in Public Technology

    The proposal is this: there should be a new Masters level course at at least one university which would take people with the raw skill and the motivation and puts them on a path to becoming experts in the impactful use of digital technologies for social purposes. Here’s how I think it might work.

    In the first instance, the course would only be for people who could already code well (if all went well, we could develop a sister course for non-coders later on). Over the course of a single year it would teach its students a widely varied curriculum, covering the structure and activities of government, campaigns, NGOs and companies. It would involve dissecting more and less impactful digital services and campaigns, like biology students dissect frogs, looking for strengths and weaknesses. It would involve teaching the basics of social science methodologies, such as how to look for statistical significance, and good practice in privacy management. It would encourage good practice in User Experience design, and challenge people to think about how serious problems could be solved playfully. It would involve an entire module on explaining the dos and don’t of digital technology to less-literate decision makers. And most important, it would end with a ‘thesis’ that would entail  the construction of some meaningful tool, either alone or in collaboration with other students and external organisations.

    I would hope we could get great guest lecturers on a wide range of topics. My fantasy starter for 10 would include names as varied in their disciplines as Phil Gyford, David Halpern, Martha Lane Fox, Ben Goldacre, Roz Lemieux, William Perrin, Jane McGonigal, Denise Wilton, Ethan Zuckerman, as well as lots of people from in and around mySociety itself.

    What would it take?

    I don’t know the first thing about how universities go about creating new courses, so having someone who knew about that step up as a volunteer would be a brilliant start!

    Next, it would presumably take some money to make it worth the university’s time. I would like to think that there might be some big IT company that would see the good will to be gleaned from educating a new generation of socially minded, organisation-reforming technologists.

    Third, we’d actually need a university with a strong community of programmers attached, willing and ready to do something different. It wouldn’t have to be in the UK, either, necessarily.

    Then it would need a curriculum, and teaching, which I would hope mySociety could lead on, but which would doubtless best be created and taught in conjunction with real academics. We’d need some money to cover our time doing this, too.

    And finally it would need some students. But my hunch is that if we do this right, the problem will probably be fending people off with sticks.

    What next?

    I’m genuinely not sure – I hope this post sparks some debate, and I hope it provokes some people to go “Yeah, me too”. Maybe you could tell me what I should do next?

  3. Seeking help seeking female coders

    As you might know, we’ve currently got an open-call for new developers, we’re hiring quite a bit in the next six months.

    Thus far our list of people interested in the job contains no women’s names at all – zip, zero, zilch – despite us having taken soundings on how to get a more diverse sample of applicants.

    I’m really, really not OK with this. I understand the gender imbalance in tech as well as anyone, but I interpret this as ‘mySociety hasn’t reached out well enough’, not ‘blame the women for not applying’.

    So my question to you, the world at large, is this: what can we do right now, or this week anyway, to get some women’s names on this list before we start to vet the CVs?

    Applications are still very definitely open, so anyone – male, female or other – who’d like to apply should see the original blog post for how to go about it.

  4. Job Advert: Developers

    This vacancy is now filled.

    How would you like to be a coder in an organisation that is as determined to make a difference in the world as it is to be a truly high quality, engineer-led software team?

    mySociety is that organisation. We’re a project of a registered charity, currently running award-winning civic and democratic websites like TheyWorkForYou.com and FixMyStreet.com, and we’re looking to grow our already-celebrated development team by several new members over the next six months.

    We’re looking for people with at least two years experience (professional or keen amateur) in at least one of Python, Ruby, Perl, PHP, C++, Javascript or Adobe Flex, and who have ambitions to learn more languages in the future.

    We’re looking for developers willing to commit to full or mostly-full time positions (no freelancers, sorry) and who are up for a career change that will see them stay with us for a little while. You’ll get to work with volunteers, mix commercial and charitable projects, and travel far and wide. Plus, you can work from wherever you live (in the UK), and we pay salaries from £28k to £50k depending on skills.

    Most of all, we’re looking for coders who look at the services we have built so far and think “I wish I’d been on that project”. Projects you’ll likely be working on over the next few months include (but are not limited to):

    • A/B testing and conversion tracking of our charitable sites
    • Commercial spinoffs from FixMyStreet
    • Mapumental
    • Enhancements to TheyWorkForYou and WhatDoTheyKnow
    • Commercial development for clients

    And if you’ve any questions, please post them in the comments below so we can share the answers.

  5. New features on MaPit

    We’ve added a variety of new features to our postcode and point administrative area database, MaPit, in the past month – new data (Super Output Areas and Crown dependency postcodes), new functionality (more geographic functions, council shortcuts, and JSONP callback), and most interestingly for most people, a way of browsing all the data on the site.

    • Firstly, we have some new geographic functions to join touches – overlaps, covered, covers, and coverlaps. These do as you would expect, enabling you to see the areas that overlap, cover, or are covered by a particular area, optionally restricted to particular types of area. ‘coverlaps’ returns the areas either overlapped or covered by a chosen area – this might be useful for questions such as “Tell me all the Parliamentary constituencies fully or partly within the boundary of Manchester City Council” (three of those are entirely covered by the council, and two overlap another council, Salford or Trafford).
    • As you can see from that link, nearly everything on MaPit now has an HTML representation – just stick “.html” on the end of a JSON URI to see it. This makes it very easy to explore the data contained within MaPit, linking areas together and letting you view any area on Google Maps (e.g. Rutland Council on a map). It also means every postcode has a page.
    • From a discussion on our mailing list started by Paul Waring, we discovered that the NSPD – already used by us for Northern Ireland postcodes – also contains Crown dependency postcodes (the Channel Islands and the Isle of Man) – no location information is included, but it does mean that given something that looks like a Crown dependency postcode, we can now at least tell you if it’s a valid postcode or not for those areas.
    • Next, we now have all Lower and Middle Super Output Areas in the system; thanks go to our volunteer Anna for getting the CD and writing the import script. These are provided by ONS for small area statistics after the 2001 census, and it’s great that you can now trivially look up the SOA for a postcode, or see what SOAs are within a particular ward. Two areas are in MaPit for each LSOA and MSOA – one has a less accurate boundary than the other for quicker plotting, and we thought we might as well just load it all in. The licences on the CD (Conditions of supply of SOA boundaries and Ordnance Survey Output Area Licence) talk about a click-use licence, and a not very sraightforward OS licence covering only those SOAs that might share part of a boundary with Boundary-Line (whichever ones those are), but ONS now use the Open Government Licence, Boundary-Line is included in OS OpenData, various councils have published their SOAs as open data (e.g. Warwickshire), and these areas should be publicly available under the same licences.
    • As the UK has a variety of different types of council, depending on where exactly you are, the postcode lookup now includes a shortcuts dictionary in its result, with two keys, “council” and “ward”. In one-tier areas, the values will simply by the IDs of that postcode’s council and ward (whether it’s a Metropolitan district, Unitary authority, London borough, or whatever); in two-tier areas, the values will again be dictionaries with keys “district” and “council”, pointing at the respective IDs. This should hopefully make lookups of councils easier.
    • Lastly, to enable use directly on other sites with JavaScript, MaPit now sends out an “Access-Control-Allow-Origin: *” header, and allows you to specify a JSON callback with a callback parameter (e.g. put “?callback=foo” at the end of your query to have the JSON results wrapped in a call to the foo() function). JSONP calls will always return a 200 response, to enable the JavaScript to access the contents – look for the “error” key to see if something went wrong.

    Phew! I hope you find this a useful resource for getting at administrative geographic data; please do let us know of any uses you make of the site.

  6. Embedding FixMyStreet Google map in a blog

    On Twitter about 15 minutes ago, @greenerleith asked: “Has anyone worked out how to display the most recent #fixmystreet reports on a local map widget that can be embedded? #hyperlocal”

    Like this? 🙂

    It’s very simple to do:

    1. Go to FixMyStreet, and locate any RSS feed of the latest reports you want (for the above map, I used Edinburgh Waverley’s postcode of EH1 1BB; you could have used reports to a particular council, or ward, using the Local alerts section). Copy the URL of the RSS feed.
    2. Go to Google Maps, paste the RSS feed URL into its search box, and click Search Maps.
    3. Click the “Link” link to the top right of the map, and copy the “Paste HTML to embed in website” code.
    4. Paste that code into your blog post, sidebar, or wherever (you can alter the code to change its size etc.).
    5. Done. 🙂

    The latest reports from FixMyStreet, superimposed on a Google Map, embedded in your blog. Hope that’s helpful.

  7. Outlook attachments now viewable in WhatDoTheyKnow

    When a bit of government forwards or attaches emails using Outlook, they get sent using a special, strange Microsoft email format. Up until now, WhatDoTheyKnow couldn’t decode it. You’d just see a weird attachment on the response to your Freedom of Information request, and probably not be able to do anything with it.

    Peter Collingbourne got fed up with this, and luckily for us, he can code too. He forked our source code repository, and made a nice patch in his own copy of it.

    He then told us about it, and I merged his changes into the main WhatDoTheyKnow code, tested them out on my laptop, then made them live. It all work perfectly first time. Peter even added the new dependency on vpim to WhatDoTheyKnow conf/packages.

    Now if you go to an Outlook attachment on WhatDoTheyKnow,such as this one you’ll just see the files, and be able to download them, and view them as HTML as normal. They’ll also get indexed by the search (although I need to do a rebuild for that for it to work with old requests).

    Thanks Peter!

    If you want to have a go making an improvement to a mySociety site, you can get the code for most of them from our github repositories. For some sites, there’s an INSTALL.txt file explaining how to get a development environment set up. Let us know if you do anything – even incremental improvements to installation instructions are really useful. And new, useful, features like Peter’s are even more so.

  8. Duncan Parkes is our new Core Developer

    We are very happy to announce that Duncan Parkes has joined mySociety, bringing our team of full time core developers up to four.

    Duncan is the incredibly prolific author of screen scrapers for the lovely PlanningAlerts.com which he runs with Richard Pope.

    He also has a PhD in Mathematics, which I expect you’ll want to read all of here, and is an editor of Open Source programming books with APress. During the vetting process he listed one of the passions of his life as being ‘Unit Testing‘, which, combined with his love of postbox crowdsourcing, made picking him more or less a no brainer.

    In the short run we’ve let him loose, under the tutelage of Francis Irving, on the scaling challenges presented by Mapumental – I can’t wait to see what comes out of it.

  9. What are the two sorts of Cloud infrastructure called?

    I’ve been doing lots of research around “cloud computing” recently, so we can change how Mapumental works and take it out of private beta.

    One thing that’s struck me is that there doesn’t seem to be a proper, industry standard name to distinguish what to me are two fundamentally different sorts of “cloud computing”. I’m focusing here entirely on cloud services for programmers (let’s leave what it means to end users or businesses for another day).

    Here are my own names and descriptions of them:

    1) Cloud hardware server provision (Cloud HSP)
    Low level APIs for making and destroying (virtual) servers, and loading machine images onto them. e.g. Amazon Elastic Compute Cloud, Rackspace Cloud Servers, Eucalyptus’s EC2 bits. Basically, what Eucalyptus v 1.5 can do and what libcloud should do. (By analogy, this is the assembly language of cloud computing)

    2) Cloud developer service provision (Cloud DSP) A service that a developer accesses with one name and a simple API, and behind the scenes it scales for him, automatically. e.g. Amazon Queue Service, Rackspace Cloud Files. (By analogy, this layer is the C programming language of cloud computing)

    [as an aside, Google AppEngine is an interesting one. It is definitely in the Cloud DSP category, but I think it is larger than that – it is a whole set of APIs all in that category. Something like Google DataStore is a single Cloud DSP, albeit one apparently only accessible within AppEngine apps]

    It’s possible to use a Cloud HSP (assembly language), along with a bunch of your own software or open source software, to build new Cloud DSPs (C code). Right now this is pretty hard – even quite well known open source distributed datasbases like CouchDB still need scripting to even make them replicate. The code that makes and destroys servers and gives the service one name, needs manually stringing with quite new bits of wire (things like scalr and Wackamole).

    For this reason, I’m reluctant for mySociety to get into the “making our own Cloud DSP out of Cloud HSP” game. It feels to me like a suck of time, and like we wouldn’t be able to guarantee without lots of careful and expensive testing that it would scale. I’m more tempted to use the commercial Cloud DSP services where possible, even though they are proprietary. But use them via our own abstraction layer, so we can change as we need to. Of course, we have some C++ code (the public transport route finder), so will have to use the Cloud HSP API to get that going, perhaps with Amazon’s Auto Scaling. But it can jolly well use AQS and S3 to talk to other services.

    So, what do you think about the names Cloud HSP/DSP? Are there already existing names for the distinction that I’m making? Is it a useful distinction for you? Can you think of better names?

  10. WhatDoTheyKnow growing pains (and Ruby memory leaks)

    WhatDoTheyKnow keeps growing and growing, sucking people in from Google as its archive of maybe 8.5% of Freedom of Information requests gets more and more detailed.

    Graph of number of FOI requests made using WhatDoTheyKnow over time

    There’s round about 8Gb of unfettered Government data in the core database, plus a whole bunch more for indexing and caching. For comparison, TheyWorkForYou (which now goes back to 1935) has 12Gb. And it’s catching up on traffic also – WhatDoTheyKnow has about half the number of visitors as TheyWorkForYou.

    Unfortunately, this new found traffic has led to performance problems. You might have seen errors when using WhatDoTheyKnow in the last week or two. This post is firstly an apology for that. Thank you for your patience. Hopefully it is fixed now – do let us know if you get problems still. And secondly it is some techy stuff about debugging such problems in Ruby on Rails…

    When WhatDoTheyKnow started failing, we did the obvious things to start with – moving the database to a separate server, and moving some other services off the same server, to give WDTK more room to breathe. It still kept breaking.

    None of my server monitoring tools shed any very clear light as to the problem. I upgraded to the latest version of Passenger, the best Rails deployment tool I’ve seen yet. It’s pretty good, but still not mature enough for my liking. I was still getting the same problems with it, but reporting tools like passenger-memory-stats were really helpful.

    Eventually I worked out that it was to do with memory use of the Rails processes. Individual ones would leap up to 1Gb, and never drop back down. If several did, the server (with 4Gb of RAM) would start swapping and grind to a halt. The world of Ruby and Rails memory monitoring software is patchwork at best, and in the end I found the simplest tools the most useful. Here’s some:

    • I found some Rails processes were getting jammed, and not dieing even when I restarted Apache. I think in the end this was due to the Passenger spawning method, and our use of the Xapian Ruby module. Running Passenger in RailsSpawnMethod conservative mode made things much more robust.
    • Monit, which in a previous life had a job holding up vital structural pillars of buildings with duct tape, makes you feel dirty. Actually it is really useful. Given I couldn’t quickly fix the problem, Monit let me at least reduce the suffering for people trying to use the site meanwhile. Here’s the rule I used, which gives Apache a kick every time server memory use is too high. It was firing every 5 or 10 minutes…
      check system localhost
          if memory > 3500 MB then exec "/usr/sbin/apache2ctl graceful"
    • I found memory_profiler on a blog. It helps you find the kind of memory leak where you unintentionally continue to reference an object you don’t use any more. With a specialist subject of string objects. This led to a fix to do with declaring static arrays in classes vs. modules, which I still don’t really understand. But it wasn’t the cause of the big 1Gb memory munching, there were no large enough leaks of this sort.
    • The record_memory function in WDTK’s application controller came from another blog. It’s handy as it shows you how much of the system memory in the Ruby process each request causes an increase by. With caveats, this was the best way for me to identify the most damaging requests (search results, and certain public body pages). And it also brought focus on the actual problem – the peak memory use during a request. That’s really important, because Ruby’s memory manager never returns memory to the operating system… The Gb leaps in memory use were because of temporary memory used during certain requests, which the Ruby memory manager then never frees later.
    • I made a bunch of functions culminating in allocated_string_size_around_gc. This was really useful in use with the “just add lots of print statements and fiddle” school of debugging. Not everyone’s favourite school, but if your test code can’t catch it, one I often end up using (it gets really involved rarely enough that it doesn’t seem worth setting up an interactive debugger). It led me to various peak memory savings, such as calling “text.gsub!” rather than “text = text.gsub” while removing (email addresses and private information) from FOI request responses, which help quite a bit when dealing with multi-megabyte attachments.
    • Finally, I used the overlooked debugging tool, and the one you should never rely on, being common sense. That is, common sense informed by days of careful use of all the other tools. In order to quickly show text extracts when searching, WDTK stores the extracted attachment text in the database. A few of these attachments are quite large, and led to 50Mb fields, often several of which were being loaded and processed in one page request. That this would cause a high peak of memory use all became just obvious to me some time yesterday. I checked that that was the case, and this morning, I changed it to use the full text for indexing, but to at most keep 1Mb for use in snippets. So sometimes now you won’t get a good search extract for queries, but it is rare, and it will at least still return the right result.

    I’ve more work to do, I think there are quite a few other quick wins, all of which are making the site faster too. I’m quite happy that WhatDoTheyKnow also has a bunch more test code as a result of all this.

    On the other hand, what a disappointing disaster for open source languages beginning with P/R (as opposed to J). Yes, the help and tools were just about there to work it out, but would seem primitive if you’d used say Java’s Memory Analyzer. Indeed somebody over on StackOverflow suggested running your site in JRuby and using exactly that tool…