1. What are the two sorts of Cloud infrastructure called?

    I’ve been doing lots of research around “cloud computing” recently, so we can change how Mapumental works and take it out of private beta.

    One thing that’s struck me is that there doesn’t seem to be a proper, industry standard name to distinguish what to me are two fundamentally different sorts of “cloud computing”. I’m focusing here entirely on cloud services for programmers (let’s leave what it means to end users or businesses for another day).

    Here are my own names and descriptions of them:

    1) Cloud hardware server provision (Cloud HSP)
    Low level APIs for making and destroying (virtual) servers, and loading machine images onto them. e.g. Amazon Elastic Compute Cloud, Rackspace Cloud Servers, Eucalyptus’s EC2 bits. Basically, what Eucalyptus v 1.5 can do and what libcloud should do. (By analogy, this is the assembly language of cloud computing)

    2) Cloud developer service provision (Cloud DSP) A service that a developer accesses with one name and a simple API, and behind the scenes it scales for him, automatically. e.g. Amazon Queue Service, Rackspace Cloud Files. (By analogy, this layer is the C programming language of cloud computing)

    [as an aside, Google AppEngine is an interesting one. It is definitely in the Cloud DSP category, but I think it is larger than that – it is a whole set of APIs all in that category. Something like Google DataStore is a single Cloud DSP, albeit one apparently only accessible within AppEngine apps]

    It’s possible to use a Cloud HSP (assembly language), along with a bunch of your own software or open source software, to build new Cloud DSPs (C code). Right now this is pretty hard – even quite well known open source distributed datasbases like CouchDB still need scripting to even make them replicate. The code that makes and destroys servers and gives the service one name, needs manually stringing with quite new bits of wire (things like scalr and Wackamole).

    For this reason, I’m reluctant for mySociety to get into the “making our own Cloud DSP out of Cloud HSP” game. It feels to me like a suck of time, and like we wouldn’t be able to guarantee without lots of careful and expensive testing that it would scale. I’m more tempted to use the commercial Cloud DSP services where possible, even though they are proprietary. But use them via our own abstraction layer, so we can change as we need to. Of course, we have some C++ code (the public transport route finder), so will have to use the Cloud HSP API to get that going, perhaps with Amazon’s Auto Scaling. But it can jolly well use AQS and S3 to talk to other services.

    So, what do you think about the names Cloud HSP/DSP? Are there already existing names for the distinction that I’m making? Is it a useful distinction for you? Can you think of better names?

  2. Avoid exhausting train journeys!

    Last week I gave my first presentation by video conference. It was to the intriguing Circus Foundation, who are running a series of workshops on new democracy. It came about because I was a bit busy and tired to travel from Cambridge into London. Charles Armstrong, from the Circus Foundation, suggested that I present over the Internet.

    We used Skype audio and video, combined with GoToMeeting so my laptop screen was visible on a projector to an audience in London. Apparently my voice was boomed round the room. It was a slightly odd experience, more like speaking on the radio. However, I had a good serendipitous one to one chat while we were setting up, with Jonathan Gray from OKFN.

    I was asked to give a quick overview of mySociety, as a few people in the audience hadn’t heard of us, and also to talk about how I saw the future of democracy. I talked about three of our sites, and what I’d like to see in each area in 10 years time.

    • TheyWorkForYou opens up access to conventional, representational democracy, between and during elections. In 10 years time, I asked for Parliament to publish all information about its work in a structured way, as hinted at in our Free Our Bills campaign. So it is much easier for everyone to help make new laws better.
    • FixMyStreet is local control of the things people care about, a very practical democracy. In 10 years time I’d like to see all councils running their internal systems (planning, tree preservation orders… everything that isn’t about individuals) in public, so everyone can see and be reassured about what is being done, why and where.
    • WhatDoTheyKnow shows the deep interest that there is by the public in the functioning of all areas of government. In 10 years time, I’d like to see document management systems in wide use by public authorities that publish all documents by default. Only if overridden for national security or data protection reasons would they be hidden.

    Charles Armstrong, from the Circus Foundation, has written up the workshop.

    Downsides of the video conferencing were that I couldn’t hear others speak, as they didn’t have the audio equipment. I had to take questions via Charles. This meant I also couldn’t participate in the rest of the evening, or easily generally chat to people. All very solvable problems, with a small amount of extra effort – Charles is going to work on it for another time.

    Of course this also all saves on carbon emissions (cheekily, taking off my mySociety hat for a moment, sign up to help lobby about that).

  3. Dull sysadmin/infrastructure stuff

    We’re looking for ways to make it easier for volunteers to get involved in mySociety. Like everything in real life this is mostly a question of openness and policy, but there are also a few technical steps we think would make life easier. One of these is to make it easier for us to hand over a test web server to a volunteer or a group of volunteers to develop code on, play with and generally break. At the moment that’s quite hard to do, because we use apache and all our sites are hosted on one machine (yes really — computers are fast and memory is cheap, though in day-to-day life you’d never notice that, because most of the IT industry is involved in developing “technology” — meaning, “programs that don’t work yet” — that are designed to make your computer slow again: Microsoft Windows, Java, modern web browsers, etc. etc.). Apache is monolithic and if one user breaks the configuration of their test site they can bring down all the sites hosted on the machine. Also, apache isn’t very good at crossing security boundaries (arguably that’s a fault of UNIX generally), so unless we’re prepared to give all the volunteers root (not acceptable for policy reasons) they need to hassle us to get things done (not acceptable to them). Indeed, to save time and admin hassle, IVotedForYouBecause was developed and is hosted elsewhere.

    So the idea is to strip away all this crap by running lots of instances of apache, and giving one to any group of volunteers who want to play with one of our sites — all running under their own unprivileged UID — and then direct requests from the outside world through to the appropriate internal apache server via a public-facing proxy. The design I’m envisaging looks something like this:

    Basically pointless block diagram

    (Actually it’s not clear to me that that diagram conveys anything you won’t have understood anyway, but there we go.) For the front-end server I’m using Squid, which is balky and overcomplicated, but supports one very handy feature which is invaluable in this setup: external URL rewriting scripts, which can be used to redirect requests that come through the cache to other resources. The classic application of this is to redirect requests for advertising and other pointless content to local resources so that they don’t take up bandwidth or break your web browser; in this case we’ll be using it to rewrite requests for certain publically-visible URLs (“http://fred.pledgebank.com/…” or whatever) into internal URLs which route to individual users’ apache servers (“http://127.0.0.1:8001/…”). One of the nice things about this is that it preserves the Host: header, and (with a further small hack) apache can be persuaded to pretend that requests weren’t proxied at all, so any back-end stuff that needs to know clients’ IP addresses (such as logging, etc.) can be used unmodified. On top of this, squid will cache responses (assuming that we aren’t lazy about the headers we emit on our own content), which may speed things up a bit for certain sites, though I suspect (with little evidence, and none I’m prepared to bore you with now) this won’t be very useful in practice for the types of sites we’re building.

    Another attractive feature of this scheme is that it means that we’re not tied to apache: we could use lighttpd or something, if we wanted to. I doubt that a technical reason to do that will arise in practice, but every minute I spend fighting apache configurations is a minute closer to chucking the bloody thing and picking another web server.

    So, it’s the usual story: you start off trying to work around the brokenness in one bit of software, and then all sorts of exciting possibilities suddenly open up. At least, that’s one way to look at it.

  4. A short message

    So, my script actually decided to hassle Matthew into writing a post here, but he’s off enjoying himself at EtCon so I’m stepping into the breach myself.

    One of the features PledgeBank will support is allowing users to sign up to a pledge by SMS (as well as through the traditional web form plus email confirmation route). So, that’s what I’ve been working on lately.

    The idea is that the user sees a poster or flyer describing the pledge, with the instruction, “text ‘PLEDGE {name-of-pledge}’ to {number} to sign up!”. We then sign them up and send them another SMS when the pledge completes.

    Now, the last time I had anything to do with SMS, making any progress required writing a giant C program which talked to an “SMS message centre” via the universal computer protocol (so called because it’s not universal and isn’t used for talking to computers — in fact, I think it’s a hangover from the days of numeric-only pagers). Since then, this stuff has become a lot easier; there are lots of companies which do the talking-to-the-phone-company stuff for you, and let you transmit and receive SMSs over HTTP. The one we’re using for testing, MX Telecom, seem pretty typical; you submit messages to send via HTTP POST, and get told about incoming messages and delivery reports via — slightly disappointingly — a GET to your own servers. (Further disappointment: while sending is authenticated — via a plaintext username and password — receiving isn’t, other than by source IP address. Ho-hum.)

    For those who don’t use them, SMS messages can be regarded as being like email, but much more expensive, slower, and not as reliable. Oh, and you only get to send 160 characters in each message, unless you want to use characters outside the ISO-8859-15 code page (“Latin1 with added ‘?'”), in which case you get only 70 characters. The one feature SMS has which email doesn’t — other than cost — is delivery reports. Decent aggregators will send you a message which tells you whether a message you’ve sent has been delivered, buffered in the network waiting for the recipient’s phone to be switched on, rejected because the user pays to receive SMSs and has run out of credit, or various other rare conditions.

    It’s also possible to send “premium rate” reverse-billed SMS messages, where the punter pays to receive them. Sadly you don’t get the option of sending these from your regular phone, but if you pay BIG CASH MONEY to an SMS aggregator you can do so. These messages typically cost the recipient 25p or 50p (there are a whole set of billing bands). For the cheaper messages, the telephone company keeps most of the money, the aggregator takes a bit, and eventually you get 4–6p.

    Sending a normal (“bulk”) SMS message costs somewhere from 5p to 10p, in small quantities.

    So so far our plan looks like this:

    • User texts us, asking to be signed up
    • We send them a reverse billed message containing a URL which they can use to convert their SMS subscription into an email one
    • If we get a successful delivery report for that message, or if they visit the URL we sent them, we sign them up
    • Later on, when the pledge completes, and if we don’t have their email address, we let the pledge creator send them another SMS telling them what to do

    One obvious problem here is that, on the face of it, we lose money on every transaction here (and only letting the pledge creator send one message to the signers is pretty sucky too). Based on the estimates we’ve been given, we can break even as long as ~20% of SMS signers convert to email subscriptions. But it would be nice to be able to run these things by SMS alone, since not everybody has the web or email (or wants to use them).

    Anyway, from a code point-of-view, this all now works — I could bore you for hours about all the nasty cases which occur when somebody signs up by email, and by SMS, and then tries to convert their SMS subscription to email and so forth, but anyway, we’ve solved that one and this post is already too long — so all we have to do now is get the best deal we can for SMS messages so that this becomes practically useful….