We’re looking for ways to make it easier for volunteers to get involved in mySociety. Like everything in real life this is mostly a question of openness and policy, but there are also a few technical steps we think would make life easier. One of these is to make it easier for us to hand over a test web server to a volunteer or a group of volunteers to develop code on, play with and generally break. At the moment that’s quite hard to do, because we use apache and all our sites are hosted on one machine (yes really — computers are fast and memory is cheap, though in day-to-day life you’d never notice that, because most of the IT industry is involved in developing “technology” — meaning, “programs that don’t work yet” — that are designed to make your computer slow again: Microsoft Windows, Java, modern web browsers, etc. etc.). Apache is monolithic and if one user breaks the configuration of their test site they can bring down all the sites hosted on the machine. Also, apache isn’t very good at crossing security boundaries (arguably that’s a fault of UNIX generally), so unless we’re prepared to give all the volunteers root (not acceptable for policy reasons) they need to hassle us to get things done (not acceptable to them). Indeed, to save time and admin hassle, IVotedForYouBecause was developed and is hosted elsewhere.
So the idea is to strip away all this crap by running lots of instances of apache, and giving one to any group of volunteers who want to play with one of our sites — all running under their own unprivileged UID — and then direct requests from the outside world through to the appropriate internal apache server via a public-facing proxy. The design I’m envisaging looks something like this:
[Basically pointless block diagram – server hosted on has gone]
(Actually it’s not clear to me that that diagram conveys anything you won’t have understood anyway, but there we go.) For the front-end server I’m using Squid, which is balky and overcomplicated, but supports one very handy feature which is invaluable in this setup: external URL rewriting scripts, which can be used to redirect requests that come through the cache to other resources. The classic application of this is to redirect requests for advertising and other pointless content to local resources so that they don’t take up bandwidth or break your web browser; in this case we’ll be using it to rewrite requests for certain publically-visible URLs (“http://fred.pledgebank.com/…” or whatever) into internal URLs which route to individual users’ apache servers (“http://127.0.0.1:8001/…”). One of the nice things about this is that it preserves the Host: header, and (with a further small hack) apache can be persuaded to pretend that requests weren’t proxied at all, so any back-end stuff that needs to know clients’ IP addresses (such as logging, etc.) can be used unmodified. On top of this, squid will cache responses (assuming that we aren’t lazy about the headers we emit on our own content), which may speed things up a bit for certain sites, though I suspect (with little evidence, and none I’m prepared to bore you with now) this won’t be very useful in practice for the types of sites we’re building.
Another attractive feature of this scheme is that it means that we’re not tied to apache: we could use lighttpd or something, if we wanted to. I doubt that a technical reason to do that will arise in practice, but every minute I spend fighting apache configurations is a minute closer to chucking the bloody thing and picking another web server.
So, it’s the usual story: you start off trying to work around the brokenness in one bit of software, and then all sorts of exciting possibilities suddenly open up. At least, that’s one way to look at it.
If netcraft.com is right then you’re running all this on FreeBSD, right?
Why not do this in FreeBSD jails (see jail(2) and jail(8) for details). You can give each group of volunteers their own jail. This would work much the same way you propose above, but with the advantage that if the volunteers need to do anything as root they can (installing new packages, for instance).
Yes, everything’s running on FreeBSD at the moment. jails are a possibility which I haven’t considered in very much detail. I’m rather hoping that that complexity isn’t necessary — for development sites at least, we should be able to have everything running under one UID (web server, web scripts, and perhaps other code such as a database server if necessary), so no further partitioning should be required. This obviously isn’t ideal for a live site, but for the kind of environment I’m thinking of here it seems a reasonable trade-off.
I am doing something very similar and it works well. May I suggest that you use much higher port numbers though so that there will not be a conflict between your apache instances and any other program that requests a port. Please see http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html for more details. Staying above 64000 seems safe.
I might have missed the point here, but if it’s a test webserver why does it have to be running on port 80 anyway ? Can’t you just give out test webservers on different ports, and ignore the problem , until one of them is ready to go live at which point you fold it into the production webserver, with a static fixed configuration.
It’s a possibility, but running webservers on lots of random different ports is a bit ick.