We’re looking for ways to make it easier for volunteers to get involved in mySociety. Like everything in real life this is mostly a question of openness and policy, but there are also a few technical steps we think would make life easier. One of these is to make it easier for us to hand over a test web server to a volunteer or a group of volunteers to develop code on, play with and generally break. At the moment that’s quite hard to do, because we use apache and all our sites are hosted on one machine (yes really — computers are fast and memory is cheap, though in day-to-day life you’d never notice that, because most of the IT industry is involved in developing “technology” — meaning, “programs that don’t work yet” — that are designed to make your computer slow again: Microsoft Windows, Java, modern web browsers, etc. etc.). Apache is monolithic and if one user breaks the configuration of their test site they can bring down all the sites hosted on the machine. Also, apache isn’t very good at crossing security boundaries (arguably that’s a fault of UNIX generally), so unless we’re prepared to give all the volunteers root (not acceptable for policy reasons) they need to hassle us to get things done (not acceptable to them). Indeed, to save time and admin hassle, IVotedForYouBecause was developed and is hosted elsewhere.
So the idea is to strip away all this crap by running lots of instances of apache, and giving one to any group of volunteers who want to play with one of our sites — all running under their own unprivileged UID — and then direct requests from the outside world through to the appropriate internal apache server via a public-facing proxy. The design I’m envisaging looks something like this:
[Basically pointless block diagram – server hosted on has gone]
(Actually it’s not clear to me that that diagram conveys anything you won’t have understood anyway, but there we go.) For the front-end server I’m using Squid, which is balky and overcomplicated, but supports one very handy feature which is invaluable in this setup: external URL rewriting scripts, which can be used to redirect requests that come through the cache to other resources. The classic application of this is to redirect requests for advertising and other pointless content to local resources so that they don’t take up bandwidth or break your web browser; in this case we’ll be using it to rewrite requests for certain publically-visible URLs (“http://fred.pledgebank.com/…” or whatever) into internal URLs which route to individual users’ apache servers (“http://127.0.0.1:8001/…”). One of the nice things about this is that it preserves the Host: header, and (with a further small hack) apache can be persuaded to pretend that requests weren’t proxied at all, so any back-end stuff that needs to know clients’ IP addresses (such as logging, etc.) can be used unmodified. On top of this, squid will cache responses (assuming that we aren’t lazy about the headers we emit on our own content), which may speed things up a bit for certain sites, though I suspect (with little evidence, and none I’m prepared to bore you with now) this won’t be very useful in practice for the types of sites we’re building.
Another attractive feature of this scheme is that it means that we’re not tied to apache: we could use lighttpd or something, if we wanted to. I doubt that a technical reason to do that will arise in practice, but every minute I spend fighting apache configurations is a minute closer to chucking the bloody thing and picking another web server.
So, it’s the usual story: you start off trying to work around the brokenness in one bit of software, and then all sorts of exciting possibilities suddenly open up. At least, that’s one way to look at it.