Back in November 2006 we launched Number 10’s petitions website. We were pretty proud of the usability-centred site we built – we can still lay a pretty good claim to it being one of the biggest democracy sites (measured in terms of people transacting) that the world’s ever seen.
Over 12 million signatures had been added to petitions by the time the site was switched off after the 2010 general election. We were particularly proud of developing a system that was highly load-tolerant: we once survived over 20,000 people signing within a single hour, all whilst running on a pair of cheap little servers. That performance on so little hardware was down to the raw brilliance represented by a coding team made up of Francis Irving, Matthew Somerville, and the late, great Chris Lightfoot.
We’re also pleased that the popularity of the site led to the irresistible rise of the belief that the public should be able to petition the government via the internet. So even though our site was mothballed, Parliament and DirectGov have taken over the idea, and the commitment has been upped a notch, from ‘we’ll send a reply’ to ‘we’ll talk about it’. To be clear, we are not, nor have ever been a community interested in replacing representative democracy with direct democracy, but anything that can squeeze any drop of change from Parliament is worth a small celebration.
What’s most pleasing, though, is that we’ve been able to take the open source code built for Number 10, improve and expand upon it to develop a hosted petitions service for local councils around the country, or the rest of the world. And this is where we found the most important lesson for us: local petitions can be awesome, and despite the much smaller numbers of signatories involved, we’ve been more widely and frequently impressed by local petitions and responses than at the more glamorous national level. We’re particular fans of Hounslow Borough Council who have given positive and detailed feedback on all sorts of genuine local issues, as well as working hard to let local residents know that the service exists.
Just recently we launched a site to make it really easy to find local council petition websites, because there are hundreds hidden away (we built some; most are supplied by other vendors). If we could see anything result from today’s huge explosion of interest in online petitions, it would be that people might start to look local, and explore what petitions in their community could mean.
So, I’ve just had a shower and I’m waiting for Matthew and Tom to turn up. As time goes on, mySociety seems to get more geographically disparate, and I look forward to meeting my coworkers. Then for 1pm we’ll be heading to CB2 for the mySociety developers meeting. Feel free to come along any time afternoon or evening, whatever your skills or interest in mySociety.
I haven’t posted on here for ages, since October. I’ve been away on holiday quite a lot, and when I’ve not been away I’ve been busy, partly with systems administration. We’ve set up lots of servers in the last month for the E Petitions site. When you go from 3 servers to 7 servers, there’s another step change in sorting out systems administration tools. For example, I had to change the monitoring script so every server wouldn’t monitor every other. And I had to work out the quirks and bugs in the system we have for storing config files for different classes of server in CVS. Because we only had one class of server before.
I’ve also had to learn lots about server monitoring and load balancing. Things have slowed down a bit now, to maybe 10 hits per second. But a few weeks ago the road pricing petition was often getting 50 hits per second. I’ve never worked on a site with that level of traffic before. You find all the bugs in your code, all the missing indices in PostgreSQL, all the badly tweaked FastCGI parameters. I’ve been sucking knowledge off Chris like a sponge, so tools like strace and vmstat begin to become instinctive. Seemingly nobody offers a book or a course which teaches this stuff well – every server setup is different, everyone knows different ways to tune and profile. But maybe you can tell me different in the comments.
Louise has been busily working away on lots of things. Amongst that is a major change to WriteToThem, to let you write to all the members in a multi-member constituency in one go. The last day or two, she’s been installing the WriteToThem test code on one of our servers, when it has only run on my laptop before. This will be fantastic – hopefully can get Matthew to be bolder about making changes to WriteToThem, if he has a test script he can easily run (getting Matthew to be bold isn’t normally a problem, but he seems mildly less bold when it comes to the WriteToThem queue daemon).
Tom and I have also been busy on a second travel maps report for the DfT. More on that soon. Lots of running screen scraping jobs on TransportDirect which take days. On the subject of Tom, he seems to have got expert at “stacking meetings” next to each other. In one day last week he had 7 meetings!
Partly for our own internal documentation, and partly because it might be of interest to (some) readers, some notes on how the Number 10 petitions site works. On the face of it you’d imagine this would be very simple, but as usual it’s complicated by performance requirements (our design was motivated by the possibility that a large NGO might mail a very large mailing list inviting each member to sign a particular petition, so that we might have to sustain a very high signup rate for an extended period of time). Here’s a picture of the overall architecture:
(This style of illustration is unaccountably popular in the IT industry but unlike most examples of the genre, I’ve tried to arrange that this one actually contains some useful information. In particular I’ve tried to mark the direction of flow of information, and separate out the various protocols; as usual there are too many of the latter. The diagram is actually a slight lie because it misses out yet another layer of IPC—between the web server, apache, and the front-end FastCGI scripts.)
Viewing petition pages is pretty conventional. Incoming HTTP requests reach a front-end cache (an instance of squid, one per web server, cacheing in memory only); squid passes them to the petition browsing scripts (written in perl running under FastCGI) to display petition information. Those scripts consult the database for the current state of the relevant petition and pass it back to the proxy, and thence to the web client. This aspect of the site is not very challenging.
Signing petitions is harder. The necessary steps are:
- write a database record about the pending signature;
- send the user an email containing a unique link to confirm their signature;
- update the database record when the user clicks the link;
- commit everything to stable storage; and finally
- tell the user that their signature has been entered and confirmed.
The conventional design for this would be to have the web script, when it processes the HTTP request for a new signature, submit a message for sending by a local mail server and write a row into the database and commit it, forcing the data out to disk. The mail server would then write the message into its spool directory, and fsync it, forcing it out to disk. The mail server will then pick the mail out of its queue and send it to a remote server, at which point it will be erased from the queue. Later on the mail will arrive in the
user’s inbox, at which point they will (presumably) click the link, resulting in another HTTP request which causes the web script to update the corresponding database row and commit the result. While this is admirably modular it requires far more disk writes than necessary to actually complete the task, which limits its potential speed. (In particular, there’s no reason to have a separate MTA spool directory and for the MTA to make its own writes to that directory.)
At times of high load, it is also extremely inefficient to do one commit per signature. It takes about as long to commit ten new or changed rows to the database as it is to commit one (because the time spent is determined by the disk seek time). Therefore to achieve high performance it is necessary to batch signatures. Unfortunately this is a real pain to implement because all the common web programming models use one process per concurrent request, and it is inconvenient to share database handles between different processes. The correct answer to this problem would of course be to write the signup web script as a single-process multiplexing implementation, but that’s a bit painful (we’d have had to implement our own FastCGI wire protocol library, or alternatively an HTTP server) and the deadlines were fairly tight. So instead we have a single-process server, petsignupd, which accepts signup and confirmation requests from the front-end web scripts over a simple UDP protocol, and passes them to the database in batches every quarter of a second. In theory, therefore, users should see a maximum latency of a bit over 0.25s, but we should achieve close to the theoretical best throughput of incoming requests. (Benchmarking more-or-less bears this out.)
Sending the corresponding email is also a bit problematic. General-purpose MTAs are not optimised for this sort of situation, and (for instance) exim can’t keep up with the sustained signup rate we were aiming for even if you put all of its spool directories on a RAM disk and accept that you have to repopulate its outgoing queue in the event of a crash. The solution was to write petemaild, a small multiplexed SMTP sending server; unlike a general-purpose MTA this manages its queue in memory and communicates updates directly to the database (when a confirmation email is delivered or delivery fails permanently).
It’s unfortunate that such a complex system is required to fulfil such a simple requirement. If we’d been prepared to write the whole thing ourselves, from processing HTTP requests down to writing signatures out to files on disk, the picture above would look much simpler (and there would be fewer IPC boundaries at which things could go wrong). On the other hand the code itself would be a lot more complex, and there’d be a lot more of it. I don’t think I’d describe this design as a “reasonable” compromise, but it’s at least an adequate one.
I’m very pleased to announce that the petitions system we’ve built for 10 Downing Street has gone live today.
I’m very grateful for the hard and often inspired work put into this by Chris Lightfoot and Matthew Somerville, as well as the civil servants who have helped to build a petitions system which I believe is in a real class of its own.
The most notable features are:
1. Petitions are accepted and published, regardless of the political slant of the petition. However, if they break the Ts&Cs (a petition that doesn’t actually ask for any action, for example) then they are put on a special rejected petitions page: they don’t just vanish. We think this transparency feature is probably unique.
2. The site is being launched in beta, and will change over time. This might seem too commonplace to note for many of you, but it reflects a willingness to see a public IT service evolve in response to users, not simply fulfil a contract agreed in advance. mySociety exists partly to spread good practice in the public sector, and we think this is a nice example of that in action.
3. The code, including Chris’s amazing high-load optimised engine, is all open source.
Unfortunately, PledgeBank is a pretty slow site. Generating the individual pledge page (done by mysociety/pb/web/ref-index.php) can take anything up to 150ms. That’s astonishingly slow, given the speed of a modern computer. What takes the time?
It’s quite hard to benchmark pages on a running web server, but one approach that I’ve found useful in the past is to use an analogue of phase-sensitive detection. Conveniently enough, all the different components of the site — the webserver, the database and the PHP process — run as different users, so you can easily count up the CPU time being used by the different components during an interval. To benchmark a page, then, request it a few times and compute the amount of CPU time used during those requests. Then sleep for the same amount of time, and compute the amount of CPU time used by the various processes while you were sleeping. The difference between the values is an estimate of the amount of CPU time taken servicing your requests; by repeating this, a more accurate estimate can be obtained. Here are the results after a few hundred requests to http://www.pledgebank.com/100laptop, expressed as CPU time per request in ms:
Subsystem User System apache ~0 ~0 PostgreSQL 55±9 6±4 PHP 83±8 4±4
(The code to do the measurements — Linux-specific, I’m afraid — is in mysociety/bin/psdbench.)
So that’s pretty shocking. Obviously if you spend 150ms of CPU time on generating a page then the maximum rate at which you can serve users is ~1,000 / 150 requests/second/CPU, which is pretty miserable given that Slashdot can relatively easily drive 50 requests/second. But the really astonishing thing about these numbers is the ~83ms spent in the PHP interpreter. What’s it doing?
The answer, it turns out, is… parsing PHP code! Benchmarking a page which consists only of this:
<? /* ... */ require_once '../conf/general'; require_once '../../phplib/db.php'; require_once '../../phplib/conditional.php'; require_once '../phplib/pb.php'; require_once '../phplib/fns.php'; require_once '../phplib/pledge.php'; require_once '../phplib/comments.php'; require_once '../../phplib/utility.php'; exit; ?>
reveals that simply parsing the libraries we include in the page takes about 35ms per page view! PHP, of course, doesn’t parse the code once and then run the bytecode in a virtual machine for each page request, because that would be too much like a real programming language (and would also cut into Zend’s market for its “accelerator” product, which is just an implementation of this obvious idea for PHP).
So this is bad news. The neatest approach to fixing this kind of performance problem is to stick a web cache like squid in front of the main web site; since the pledge page changes only when a user signs the pledge, or a new comment is posted, events which typically don’t occur anywhere near as frequently as the page is viewed, most hits ought to be servable from the cache, which can be done very quickly indeed. But it’s no good to allow the pledge page to just sit in cache for some fixed period of time (because that would be confusing to users who’ve just signed the pledge or written a comment, an effect familiar to readers of the countless “Movable Type” web logs which are adorned with warnings like, “Your comment may take a few seconds to appear — please don’t submit twice”). So to do this properly we have to modify the pledge page to handle a conditional GET (with an If-Modified-Since: or If-None-Match: header) and quickly return a “304 Not Modified” response to the cache if the page hasn’t changed. Unfortunately if PHP is going to take 35ms to process such a request (ignoring any time in the database), that still means only 20 to 30 requests/second, which is better but still not great.
(For comparison, a mockup of a perl program to process conditional GETs for the pledge page can serve each one in about 3ms, which isn’t much more than the database queries it uses take on their own. Basically that’s because the perl interpreter only has to parse the code once, and then it runs in a loop accepting and processing requests on its own.)
However, since we unfortunately don’t have time to rewrite the performance-critical bits of PledgeBank in a real language, the best we can do is to try to cut the number of lines of library code that the site has to parse on each page view. That’s reduced the optimal case for the pledge page — where the pledge has not changed — to this:
<? /* ... */ require_once '../conf/general'; require_once '../../phplib/conditional.php'; require_once '../../phplib/db.php'; /* Short-circuit the conditional GET as soon as possible -- parsing the rest of * the includes is costly. */ if (array_key_exists('ref', $_GET) && ($id = db_getOne('select id from pledges where ref = ?', $_GET['ref'])) && cond_maybe_respond(intval(db_getOne('select extract(epoch from pledge_last_change_time(?))', $id)))) exit(); /* ... */ ?>
— that, and a rewrite of our database library so that it didn’t use the gigantic and buggy PEAR one, has got us up to somewhere between 60 and 100 reqs/sec, which while not great is enough that we should be able to cope with another similar Slashdotting.
For other pages where interactivity isn’t so important, life is much easier: we can just emit a “Cache-Control: max-age=…” header, which tells squid that it can re-use that copy of the page for however long we specify. That means squid can serve that page at about 350reqs/sec; unfortunately the front page isn’t all that important (most users come to PledgeBank for a specific pledge).
There’s a subtlety to using squid in this kind of (“accelerator”) application which I hadn’t really thought about before. What page you get for a particular URL on PledgeBank (as on lots of other sites) vary based on the content of various headers sent by the user, such as cookies, preferred languages, etc.; for instance, if you have a login cookie, you’ll see a “log out” link which isn’t there if you’re an anonymous user. HTTP is set up to handle this kind of situation through the Vary: header, which the server sends to tell clients and proxies on which headers in the request the content of the response depends. So, if you have login cookies, you should say, “Vary: Cookie”, and if you do content-negotiation for different languages, “Vary: Accept-Language” or whatever.
PledgeBank has another problem. If the user doesn’t have a cookie saying which country they want to see pledges for, the site tries to guess, based on their IP address. Obviously that makes almost all PledgeBank pages potentially uncachable — the Vary: mechanism can’t express this dependency. That’s not a lot of help when your site gets featured on Slashdot!
The (desperately ugly) solution? Patch squid to invent a header in each client request, X-GeoIP-Country:, which says which country the client’s IP address maps to, and then name that in the Vary: header of the outgoing pledges. It’s horrid, but it seems to work.