Under the bonnet

Partly for our own internal documentation, and partly because it might be of interest to (some) readers, some notes on how the Number 10 petitions site works. On the face of it you’d imagine this would be very simple, but as usual it’s complicated by performance requirements (our design was motivated by the possibility that a large NGO might mail a very large mailing list inviting each member to sign a particular petition, so that we might have to sustain a very high signup rate for an extended period of time). Here’s a picture of the overall architecture:

Diagram representing petitions site architecture

(This style of illustration is unaccountably popular in the IT industry but unlike most examples of the genre, I’ve tried to arrange that this one actually contains some useful information. In particular I’ve tried to mark the direction of flow of information, and separate out the various protocols; as usual there are too many of the latter. The diagram is actually a slight lie because it misses out yet another layer of IPC—between the web server, apache, and the front-end FastCGI scripts.)

Viewing petition pages is pretty conventional. Incoming HTTP requests reach a front-end cache (an instance of squid, one per web server, cacheing in memory only); squid passes them to the petition browsing scripts (written in perl running under FastCGI) to display petition information. Those scripts consult the database for the current state of the relevant petition and pass it back to the proxy, and thence to the web client. This aspect of the site is not very challenging.

Signing petitions is harder. The necessary steps are:

  • write a database record about the pending signature;
  • send the user an email containing a unique link to confirm their signature;
  • update the database record when the user clicks the link;
  • commit everything to stable storage; and finally
  • tell the user that their signature has been entered and confirmed.

The conventional design for this would be to have the web script, when it processes the HTTP request for a new signature, submit a message for sending by a local mail server and write a row into the database and commit it, forcing the data out to disk. The mail server would then write the message into its spool directory, and fsync it, forcing it out to disk. The mail server will then pick the mail out of its queue and send it to a remote server, at which point it will be erased from the queue. Later on the mail will arrive in the
user’s inbox, at which point they will (presumably) click the link, resulting in another HTTP request which causes the web script to update the corresponding database row and commit the result. While this is admirably modular it requires far more disk writes than necessary to actually complete the task, which limits its potential speed. (In particular, there’s no reason to have a separate MTA spool directory and for the MTA to make its own writes to that directory.)

At times of high load, it is also extremely inefficient to do one commit per signature. It takes about as long to commit ten new or changed rows to the database as it is to commit one (because the time spent is determined by the disk seek time). Therefore to achieve high performance it is necessary to batch signatures. Unfortunately this is a real pain to implement because all the common web programming models use one process per concurrent request, and it is inconvenient to share database handles between different processes. The correct answer to this problem would of course be to write the signup web script as a single-process multiplexing implementation, but that’s a bit painful (we’d have had to implement our own FastCGI wire protocol library, or alternatively an HTTP server) and the deadlines were fairly tight. So instead we have a single-process server, petsignupd, which accepts signup and confirmation requests from the front-end web scripts over a simple UDP protocol, and passes them to the database in batches every quarter of a second. In theory, therefore, users should see a maximum latency of a bit over 0.25s, but we should achieve close to the theoretical best throughput of incoming requests. (Benchmarking more-or-less bears this out.)

Sending the corresponding email is also a bit problematic. General-purpose MTAs are not optimised for this sort of situation, and (for instance) exim can’t keep up with the sustained signup rate we were aiming for even if you put all of its spool directories on a RAM disk and accept that you have to repopulate its outgoing queue in the event of a crash. The solution was to write petemaild, a small multiplexed SMTP sending server; unlike a general-purpose MTA this manages its queue in memory and communicates updates directly to the database (when a confirmation email is delivered or delivery fails permanently).

It’s unfortunate that such a complex system is required to fulfil such a simple requirement. If we’d been prepared to write the whole thing ourselves, from processing HTTP requests down to writing signatures out to files on disk, the picture above would look much simpler (and there would be fewer IPC boundaries at which things could go wrong). On the other hand the code itself would be a lot more complex, and there’d be a lot more of it. I don’t think I’d describe this design as a “reasonable” compromise, but it’s at least an adequate one.

8 Comments

  1. Interesting, but the success of the currently most popular petition:

    “Scrap the planned vehicle tracking and road pricing policy (28207 signatures)”

    makes the front end web form virtually unusable – it is trying to display thousands of signatures at once and is therefore grinding my browser session to a halt.

    This is something which was fixed ok in PledgeBank by displayong only the last 500 signatures.

  2. Ahah ! Mystery solved. I came to to the petition page following a Google search engine query which had picked up one of the names on the full list, but not in the most recent 500.

    Perhaps a “Display only the last 500″ signatures link somewhere at the top of the form might be helpful ?

  3. Did you consider adjusting commit_delay / commit_siblings in the database? Setting a delay of .25 (and a few siblings) seems like it would remove the need for petsignupd.

    Also, what sort of hardware is the database on? How are the disks organized? What sort of benchmarks did you run to test the setup?

  4. I benchmarked it (partly testing the database directly, partly with a full test of the whole signup process). We couldn’t sustain the throughput we wanted with lots of database writers, but could with only one. It’s not very surprising that the single-threaded implementation is more efficient.

  5. Francis Irving

    It would be a bit of work making it use a different database. Also, you would have trouble running the perl daemons it requires, if you can’t install PostgreSQL.