So, PledgeBank got Slashdotted a couple of weeks ago when Mike Liveright’s $100 laptop pledge was linked from a post about the laptop. We didn’t cope very well.
Unfortunately, PledgeBank is a pretty slow site. Generating the individual pledge page (done by mysociety/pb/web/ref-index.php) can take anything up to 150ms. That’s astonishingly slow, given the speed of a modern computer. What takes the time?
It’s quite hard to benchmark pages on a running web server, but one approach that I’ve found useful in the past is to use an analogue of phase-sensitive detection. Conveniently enough, all the different components of the site — the webserver, the database and the PHP process — run as different users, so you can easily count up the CPU time being used by the different components during an interval. To benchmark a page, then, request it a few times and compute the amount of CPU time used during those requests. Then sleep for the same amount of time, and compute the amount of CPU time used by the various processes while you were sleeping. The difference between the values is an estimate of the amount of CPU time taken servicing your requests; by repeating this, a more accurate estimate can be obtained. Here are the results after a few hundred requests to http://www.pledgebank.com/100laptop, expressed as CPU time per request in ms:
Subsystem User System apache ~0 ~0 PostgreSQL 55±9 6±4 PHP 83±8 4±4
(The code to do the measurements — Linux-specific, I’m afraid — is in mysociety/bin/psdbench.)
So that’s pretty shocking. Obviously if you spend 150ms of CPU time on generating a page then the maximum rate at which you can serve users is ~1,000 / 150 requests/second/CPU, which is pretty miserable given that Slashdot can relatively easily drive 50 requests/second. But the really astonishing thing about these numbers is the ~83ms spent in the PHP interpreter. What’s it doing?
The answer, it turns out, is… parsing PHP code! Benchmarking a page which consists only of this:
<? /* ... */ require_once '../conf/general'; require_once '../../phplib/db.php'; require_once '../../phplib/conditional.php'; require_once '../phplib/pb.php'; require_once '../phplib/fns.php'; require_once '../phplib/pledge.php'; require_once '../phplib/comments.php'; require_once '../../phplib/utility.php'; exit; ?>
reveals that simply parsing the libraries we include in the page takes about 35ms per page view! PHP, of course, doesn’t parse the code once and then run the bytecode in a virtual machine for each page request, because that would be too much like a real programming language (and would also cut into Zend’s market for its “accelerator” product, which is just an implementation of this obvious idea for PHP).
So this is bad news. The neatest approach to fixing this kind of performance problem is to stick a web cache like squid in front of the main web site; since the pledge page changes only when a user signs the pledge, or a new comment is posted, events which typically don’t occur anywhere near as frequently as the page is viewed, most hits ought to be servable from the cache, which can be done very quickly indeed. But it’s no good to allow the pledge page to just sit in cache for some fixed period of time (because that would be confusing to users who’ve just signed the pledge or written a comment, an effect familiar to readers of the countless “Movable Type” web logs which are adorned with warnings like, “Your comment may take a few seconds to appear — please don’t submit twice”). So to do this properly we have to modify the pledge page to handle a conditional GET (with an If-Modified-Since: or If-None-Match: header) and quickly return a “304 Not Modified” response to the cache if the page hasn’t changed. Unfortunately if PHP is going to take 35ms to process such a request (ignoring any time in the database), that still means only 20 to 30 requests/second, which is better but still not great.
(For comparison, a mockup of a perl program to process conditional GETs for the pledge page can serve each one in about 3ms, which isn’t much more than the database queries it uses take on their own. Basically that’s because the perl interpreter only has to parse the code once, and then it runs in a loop accepting and processing requests on its own.)
However, since we unfortunately don’t have time to rewrite the performance-critical bits of PledgeBank in a real language, the best we can do is to try to cut the number of lines of library code that the site has to parse on each page view. That’s reduced the optimal case for the pledge page — where the pledge has not changed — to this:
<? /* ... */ require_once '../conf/general'; require_once '../../phplib/conditional.php'; require_once '../../phplib/db.php'; /* Short-circuit the conditional GET as soon as possible -- parsing the rest of * the includes is costly. */ if (array_key_exists('ref', $_GET) && ($id = db_getOne('select id from pledges where ref = ?', $_GET['ref'])) && cond_maybe_respond(intval(db_getOne('select extract(epoch from pledge_last_change_time(?))', $id)))) exit(); /* ... */ ?>
— that, and a rewrite of our database library so that it didn’t use the gigantic and buggy PEAR one, has got us up to somewhere between 60 and 100 reqs/sec, which while not great is enough that we should be able to cope with another similar Slashdotting.
For other pages where interactivity isn’t so important, life is much easier: we can just emit a “Cache-Control: max-age=…” header, which tells squid that it can re-use that copy of the page for however long we specify. That means squid can serve that page at about 350reqs/sec; unfortunately the front page isn’t all that important (most users come to PledgeBank for a specific pledge).
There’s a subtlety to using squid in this kind of (“accelerator”) application which I hadn’t really thought about before. What page you get for a particular URL on PledgeBank (as on lots of other sites) vary based on the content of various headers sent by the user, such as cookies, preferred languages, etc.; for instance, if you have a login cookie, you’ll see a “log out” link which isn’t there if you’re an anonymous user. HTTP is set up to handle this kind of situation through the Vary: header, which the server sends to tell clients and proxies on which headers in the request the content of the response depends. So, if you have login cookies, you should say, “Vary: Cookie”, and if you do content-negotiation for different languages, “Vary: Accept-Language” or whatever.
PledgeBank has another problem. If the user doesn’t have a cookie saying which country they want to see pledges for, the site tries to guess, based on their IP address. Obviously that makes almost all PledgeBank pages potentially uncachable — the Vary: mechanism can’t express this dependency. That’s not a lot of help when your site gets featured on Slashdot!
The (desperately ugly) solution? Patch squid to invent a header in each client request, X-GeoIP-Country:, which says which country the client’s IP address maps to, and then name that in the Vary: header of the outgoing pledges. It’s horrid, but it seems to work.
PledgeBank is quite an unusual site. Many international websites simply need translation (e.g. Debian in Chinese), there aren’t any data items which vary between regions. Others have multiple international markets, with a special website tweaked for each one (for example Amazon in Canada, which has some French and English text on every page).
PledgeBank is slightly different. First of all the interface needs translating into other languages, like Debian. And we don’t quite have markets like Amazon. Partly this is because we don’t yet know what our markets are, so we just make sites for every country and language combination. We have pledges, which have both a local area and a language associated with them. We’ve also got global pledges.
All this means that sometimes pledges and text in multiple languages gets shown on one page. For example, if your browser is configured for the Brazilian language, and you are in Brazil, then www.pledgebank.com will look like this. At the time of writing there is only one Brazilian pledge, so below it we show some global pledges in English as examples.
We use some software called GNU gettext to do our translations. Obviously, I’m not telling the truth – people do the translation, gettext just substitutes the translations into the pages. It’s a great piece of software, simple, old, well used and supported, with good tools for translators to update translations with.
For some time there’s been a bug in PledgeBank. On certain pages the language can change back and forth several times, and gettext would start returning translations for earlier languages rather than the current one. I’m setting the
LANGenvironment variable to tell it what language to use. After much debugging and an email to GNU, it turns out that this is to do with gettext’s cache. The cache was behaving differently on FreeBSD and Linux, which was confusing me even more.
To clear the cache you rebind the text domain, that is call
textdomain(textdomain(NULL)), after changing the environment variable. This makes everything work happily everywhere. And the main point of this post is to get that nugget into search engines, so anybody else with the same problem has a hope of finding out..