How (not) to survive a Slashdotting

So, PledgeBank got Slashdotted a couple of weeks ago when Mike Liveright’s $100 laptop pledge was linked from a post about the laptop. We didn’t cope very well.

Unfortunately, PledgeBank is a pretty slow site. Generating the individual pledge page (done by mysociety/pb/web/ref-index.php) can take anything up to 150ms. That’s astonishingly slow, given the speed of a modern computer. What takes the time?

It’s quite hard to benchmark pages on a running web server, but one approach that I’ve found useful in the past is to use an analogue of phase-sensitive detection. Conveniently enough, all the different components of the site — the webserver, the database and the PHP process — run as different users, so you can easily count up the CPU time being used by the different components during an interval. To benchmark a page, then, request it a few times and compute the amount of CPU time used during those requests. Then sleep for the same amount of time, and compute the amount of CPU time used by the various processes while you were sleeping. The difference between the values is an estimate of the amount of CPU time taken servicing your requests; by repeating this, a more accurate estimate can be obtained. Here are the results after a few hundred requests to http://www.pledgebank.com/100laptop, expressed as CPU time per request in ms:

Subsystem User System
apache ~0 ~0
PostgreSQL 55±9 6±4
PHP 83±8 4±4

(The code to do the measurements — Linux-specific, I’m afraid — is in mysociety/bin/psdbench.)

So that’s pretty shocking. Obviously if you spend 150ms of CPU time on generating a page then the maximum rate at which you can serve users is ~1,000 / 150 requests/second/CPU, which is pretty miserable given that Slashdot can relatively easily drive 50 requests/second. But the really astonishing thing about these numbers is the ~83ms spent in the PHP interpreter. What’s it doing?

The answer, it turns out, is… parsing PHP code! Benchmarking a page which consists only of this:

<?
/* ... */

require_once '../conf/general';
require_once '../../phplib/db.php';
require_once '../../phplib/conditional.php';
require_once '../phplib/pb.php';
require_once '../phplib/fns.php';
require_once '../phplib/pledge.php';
require_once '../phplib/comments.php';
require_once '../../phplib/utility.php';

exit;
?>

reveals that simply parsing the libraries we include in the page takes about 35ms per page view! PHP, of course, doesn’t parse the code once and then run the bytecode in a virtual machine for each page request, because that would be too much like a real programming language (and would also cut into Zend’s market for its “accelerator” product, which is just an implementation of this obvious idea for PHP).

So this is bad news. The neatest approach to fixing this kind of performance problem is to stick a web cache like squid in front of the main web site; since the pledge page changes only when a user signs the pledge, or a new comment is posted, events which typically don’t occur anywhere near as frequently as the page is viewed, most hits ought to be servable from the cache, which can be done very quickly indeed. But it’s no good to allow the pledge page to just sit in cache for some fixed period of time (because that would be confusing to users who’ve just signed the pledge or written a comment, an effect familiar to readers of the countless “Movable Type” web logs which are adorned with warnings like, “Your comment may take a few seconds to appear — please don’t submit twice”). So to do this properly we have to modify the pledge page to handle a conditional GET (with an If-Modified-Since: or If-None-Match: header) and quickly return a “304 Not Modified” response to the cache if the page hasn’t changed. Unfortunately if PHP is going to take 35ms to process such a request (ignoring any time in the database), that still means only 20 to 30 requests/second, which is better but still not great.

(For comparison, a mockup of a perl program to process conditional GETs for the pledge page can serve each one in about 3ms, which isn’t much more than the database queries it uses take on their own. Basically that’s because the perl interpreter only has to parse the code once, and then it runs in a loop accepting and processing requests on its own.)

However, since we unfortunately don’t have time to rewrite the performance-critical bits of PledgeBank in a real language, the best we can do is to try to cut the number of lines of library code that the site has to parse on each page view. That’s reduced the optimal case for the pledge page — where the pledge has not changed — to this:

<?
/* ... */

require_once '../conf/general';
require_once '../../phplib/conditional.php';
require_once '../../phplib/db.php';

/* Short-circuit the conditional GET as soon as possible -- parsing the rest of
 * the includes is costly. */
if (array_key_exists('ref', $_GET)
    && ($id = db_getOne('select id from pledges where ref = ?', $_GET['ref']))
    && cond_maybe_respond(intval(db_getOne('select extract(epoch from pledge_last_change_time(?))', $id))))
    exit();

/* ... */
?>

— that, and a rewrite of our database library so that it didn’t use the gigantic and buggy PEAR one, has got us up to somewhere between 60 and 100 reqs/sec, which while not great is enough that we should be able to cope with another similar Slashdotting.

For other pages where interactivity isn’t so important, life is much easier: we can just emit a “Cache-Control: max-age=…” header, which tells squid that it can re-use that copy of the page for however long we specify. That means squid can serve that page at about 350reqs/sec; unfortunately the front page isn’t all that important (most users come to PledgeBank for a specific pledge).

There’s a subtlety to using squid in this kind of (“accelerator”) application which I hadn’t really thought about before. What page you get for a particular URL on PledgeBank (as on lots of other sites) vary based on the content of various headers sent by the user, such as cookies, preferred languages, etc.; for instance, if you have a login cookie, you’ll see a “log out” link which isn’t there if you’re an anonymous user. HTTP is set up to handle this kind of situation through the Vary: header, which the server sends to tell clients and proxies on which headers in the request the content of the response depends. So, if you have login cookies, you should say, “Vary: Cookie”, and if you do content-negotiation for different languages, “Vary: Accept-Language” or whatever.

PledgeBank has another problem. If the user doesn’t have a cookie saying which country they want to see pledges for, the site tries to guess, based on their IP address. Obviously that makes almost all PledgeBank pages potentially uncachable — the Vary: mechanism can’t express this dependency. That’s not a lot of help when your site gets featured on Slashdot!

The (desperately ugly) solution? Patch squid to invent a header in each client request, X-GeoIP-Country:, which says which country the client’s IP address maps to, and then name that in the Vary: header of the outgoing pledges. It’s horrid, but it seems to work.

13 Comments

  1. I’d recommend the APC recommendation. It’s helped the performance on a number of my applications considerably. IIRC it’ll be included in PHP6 by default.

    Which PEAR DB library are you using? PEAR::DB has recently been deprecated in favour of PEAR::MDB2 which offers a good speed-up.

  2. Um, when I say “recommend the recommendation” I do of course mean ‘second the recommendation’

  3. Err, you *are* using a free PHP opcode cache like APC (http://pecl.php.net/package/apc) or eaccelerator (http://eaccelerator.net/) aren’t you? Because they do exactly the same as the Zend product.

    We use APC on last.fm and, combined with Memcached, this means we can cope pretty well without any seperate frontend HTML caching. Our page generation time for the site itself averages out at about 300-400ms, but this is about par for the course considering how much dynamic content we have.

    Web services hits (which use the same PHP codebase of 30k+ lines) average out at below 50ms.

  4. We had a look at the Turck mmcache thingy (assuming that all these things are the same), but it didn’t seem to do a lot of good (and I remember trying eaccelerator on TheyWorkForYou, another giant slow PHP program, with similarly disappointing results). I’ve just had a look at APC and the results are the same — with the list of include files I have above, it saves us a bit under 10ms/request. Now, 25ms vs 35ms is a saving worth having, but (a) maintaining packages for a third-party cache is going to be a pain; (b) this is a paltry saving compared to the potential 30ms we could save by doing the conditional-GET logic in perl….

  5. Oh, and on the PEAR thing, we used PEAR::DB, which was what there was when we started. The big win is that it gives you a post-1995 API for doing queries (no more having to catenate strings and have to remember when to quote them); the big lose is that it’s full of idiot “optimisations” based on mistaken assumptions such as “SELECTs don’t have side-effects and therefore we don’t need to be in a transaction to execute them”. Since the useful bit (substituting quoted variables for ‘?’ in queries) can be replaced with a trivial 30-line function it didn’t occur to me to look at PEAR again. However, looking at it now, I don’t think MDB2 offers us anything that we don’t have already.

  6. APC/eaccelerator should give you much more of a performance increase than that (eaccelerator is the successor to mmcache). I’ve typically seen a 50-80% drop in CPU usage.

    Sometimes these caching extensions can have weird issues where they don’t quite work, but it’s basically impossible to run a high-traffic site without one.

    PEAR::DB is probably the best-coded PEAR module I’ve come across, and we’ve had no problems with it. Other PEAR modules (e.g. PEAR::Date) have serious scalability problems.

  7. @Chris: you will have happy to know that the DB::isManip() related issues are solved in MDB2. As such it will not run into those transaction issues you mentioned. Also MDB2 provides support for named parameters, so you dont have to count your ‘?’ placeholders anymore.

  8. You aren’t getting the performance benefit with APC that you want because you are includng files conditionally instead of absolutely, so the compiler can’t simplify it.

    Replace your include_once and require_once with include and require, and do them before any branchpoints in your code, and you’ll get better results with APC. That is my understanding of things, based on hearing Rasmus talk.

  9. Ooh, that’s vile. What’s the recommended solution to the multiple includes problem, then? To set flags in each include file as one does with the preprocessor in C?

  10. Coming late to this, but as PHP is being run under a different user does this mean you’re using a CGI/FastCGI architecture, or are you su’ing PHP when it’s executed?

    I’m guessing either way will have an impact on performance over mod_php, but I’m curious to see how much!

  11. FastCGI. There’s some overhead versus mod_php (because of the IPC between webserver and PHP interpreter) but it’s not very great, and most of the execution time is spent parsing scripts so isn’t affected by this.