-
Last night I should have gone to bed early, but these things being how they are I stayed up late having tea with my housemate and his friend. I wanted to get up early, because I knew a few things needed tidying before we started getting media coverage, so I set my alarm. I haven’t done that for work for years! So I’m a bit sleepy.
The most important early thing I did was make the front page featured pledges appear in a random order, for more fairness and serendipity. Late last night Chris had added code in to fuzzily find pledges which somebody has typed in. It uses the database to look for the number of common three letter substrings, so if you type in “http://www.pledgebank.com/suirname” it gives a nice error page leading you to go to “http://www.pledgebank.com/Suriname”. It’s pretty good, and all I had to do was tidy up the text a bit, and add it to the search page as well.
By that time everyone else was up, and the no2id people were publicising their pledge. We were all on IRC, and tailing various logfiles. There were quite a few minor tidy ups for us to make to the launch pledges that were made over the weekend, changing text and signup numbers for the creators a bit.
Someone spotted that the “all pledges” page had the wrong calculated count for one of the pledges. This was very odd, as it was right for all the others. I downloaded a fresh dump of the database to my local machine, where everything was fine. Meanwhile, Chris noticed the PHP server was crashing. After more investigation, we found a subtle bug was creating a corrupt PHP variable. Calling “gettype” on it caused the PHP process to stop with an error, and calling number_format crashed the whole thing. We’re still not sure quite what PHP bug caused this, and need to investigate it more. But we found a simple workaround which stopped it causing any more trouble.
You always find all the bugs when your traffic goes up! That’s why staged beta getting larger and larger, of which today is in many ways the next phase, is the way to go.
-
We’ve been spending the last few days adding a more comprehensive login/authentication system to the PledgeBank code. At the moment, PledgeBank checks your email address every action that you do. In the new system you can still get it to email you if you like, or if you prefer you can set a password. It will also use session cookies to remember that you are logged in. The plan is to use the better login system to let pledge creators do more things, like email signers during the campaign, and upload a photo to go with their pledge.
This has taken quite a radical overhaul of the codebase, and the database scheme. There’s now a “person” table, which really is an email address. Chris has made a lovely elegant system, where you can just call “person_signon” in some PHP code. Then it goes away, and makes sure they are authenticated. This might be immediate, if they are already logged in. It might require a password, or it might require emailing them. Whichever way, when they come back (possibly via a link in an email), it restores the request and goes back to the page which required authentication.
In total, this will almost be a net deletion of lines of code, when the existing token systems are fully removed. Meanwhile, I’m testing and debugging it like crazy. And we’ve got to work out how to deploy the code without breaking anyone mid-signing at the moment we upgrade it. Upgrading not just the engine but the transmission as well, while the car is running.
-
We’re looking for ways to make it easier for volunteers to get involved in mySociety. Like everything in real life this is mostly a question of openness and policy, but there are also a few technical steps we think would make life easier. One of these is to make it easier for us to hand over a test web server to a volunteer or a group of volunteers to develop code on, play with and generally break. At the moment that’s quite hard to do, because we use apache and all our sites are hosted on one machine (yes really — computers are fast and memory is cheap, though in day-to-day life you’d never notice that, because most of the IT industry is involved in developing “technology” — meaning, “programs that don’t work yet” — that are designed to make your computer slow again: Microsoft Windows, Java, modern web browsers, etc. etc.). Apache is monolithic and if one user breaks the configuration of their test site they can bring down all the sites hosted on the machine. Also, apache isn’t very good at crossing security boundaries (arguably that’s a fault of UNIX generally), so unless we’re prepared to give all the volunteers root (not acceptable for policy reasons) they need to hassle us to get things done (not acceptable to them). Indeed, to save time and admin hassle, IVotedForYouBecause was developed and is hosted elsewhere.
So the idea is to strip away all this crap by running lots of instances of apache, and giving one to any group of volunteers who want to play with one of our sites — all running under their own unprivileged UID — and then direct requests from the outside world through to the appropriate internal apache server via a public-facing proxy. The design I’m envisaging looks something like this:
[Basically pointless block diagram – server hosted on has gone]
(Actually it’s not clear to me that that diagram conveys anything you won’t have understood anyway, but there we go.) For the front-end server I’m using Squid, which is balky and overcomplicated, but supports one very handy feature which is invaluable in this setup: external URL rewriting scripts, which can be used to redirect requests that come through the cache to other resources. The classic application of this is to redirect requests for advertising and other pointless content to local resources so that they don’t take up bandwidth or break your web browser; in this case we’ll be using it to rewrite requests for certain publically-visible URLs (“http://fred.pledgebank.com/…” or whatever) into internal URLs which route to individual users’ apache servers (“http://127.0.0.1:8001/…”). One of the nice things about this is that it preserves the Host: header, and (with a further small hack) apache can be persuaded to pretend that requests weren’t proxied at all, so any back-end stuff that needs to know clients’ IP addresses (such as logging, etc.) can be used unmodified. On top of this, squid will cache responses (assuming that we aren’t lazy about the headers we emit on our own content), which may speed things up a bit for certain sites, though I suspect (with little evidence, and none I’m prepared to bore you with now) this won’t be very useful in practice for the types of sites we’re building.
Another attractive feature of this scheme is that it means that we’re not tied to apache: we could use lighttpd or something, if we wanted to. I doubt that a technical reason to do that will arise in practice, but every minute I spend fighting apache configurations is a minute closer to chucking the bloody thing and picking another web server.
So, it’s the usual story: you start off trying to work around the brokenness in one bit of software, and then all sorts of exciting possibilities suddenly open up. At least, that’s one way to look at it.
-
Well, I’m back from my holiday, suitably sunburned and (relatively) relaxed. As Francis mentions, I was off in the Mediterranean somewhere (Majorca, specifically) suffering from miserable internet withdrawal symptoms. I did manage to get IRC up-and-running over dialup for election night, though this turned out to be surprisingly expensive. For once I was grateful to my iBook, which did actually Just Work when plugged into the wall.
Anyway, today’s job is sorting out the new Scottish constituency boundaries. Scotland’s Parliament was dissolved in 1707 on the passing of the Act of Union, to be reconstituted in 1999. The quid pro quo for the Scots was enhanced representation in the House of Commons; Scottish constituencies had, in 1998, an average of 55,000 electors, compared to 69,000 in England. This anomaly has now been corrected, reducing the number of constituencies in Scotland from 72 to 59; all but three of the latter have different boundaries.
This means updating MaPit, the component we built to map postcodes into electoral geography, to deal with the new boundaries. Ideally the way that we’d do this is to wait for Ordnance Survey to ship us, via our friends in ODPM, the new revision of their Boundary-Line (TM, apparently) product, with the outlines of the new constituencies encoded in attractive machine-readable form, and feed it to our existing import scripts. (As so often in life, it’s not quite that simple, but you get the general idea.) In an ideal world, this would also contain all the changed boundaries of the English counties and their constituent county electoral divisions.
However, this is not an ideal world, and though there is a new revision of Boundary-Line in the works, it hasn’t come out yet, so we have to construct the point-to-constituency mapping in some other way. Happily, at this stage of the boundary revision process, the constituency boundaries are coterminous with ward boundaries, so it’s possible to just lift the definitions of the new constituencies from the relevant Statutory Instrument and fix up the constituencies from the ward boundaries, which haven’t changed. This, sadly, has occasioned a bit of a hack to our code, because we generally don’t assume that electoral geography is hierarchically defined — because it isn’t.
(I don’t feel too bad about committing this hack, actually, because we’re likely to chuck the whole MaPit database and reconstruct it later in the year from OS data. When we built it originally, we did so from data in ESRI shapefile format; unfortunately, OS stuffed up the process of generating this from their own, internal and quite bonkers, NTF format, so the various area ID numbers in the database are not unique and not expected to be stable. We’d rather like stable ID numbers, so that we can cope gracefully with revisions to geography while maintaining continuity of, for instance, statistical data about MPs, so next time round we’re going to work from the NTF instead.)
Sadly this Scottish hack doesn’t get us anywhere with the new county boundaries, and OS have told us that not all of the updated counties will be included in the forthcoming Boundary-Line revision. So it’ll be back to the tedious conversion of statutory instruments into SQL at some point in the near future, except that we’ll probably have to start building things up from parishes, rather than wards. Expect more anguished posts on this in the future.
Meanwhile, Francis and Tom are collecting names and contact details for the new MPs. Tom tells me that this intake looks much more tech-savvy than the last, which could be good news from our (and everyone else’s) point of view. Hopefully WriteToThem will be cranking back into action — as far as MPs go, at least — fairly soon.
-
Over the weekend, and this morning, I’ve been updating the Public Whip and TheyWorkForYou database of MPs. This was much easier this year thanks, amazingly to Macromedia Flash. The BBC have a fantastic animated constituency map which is made in flash. When you click on a constituency it gives you the results. Now, one little know thing about flash is that it uses XML to communicate with the server. This means that any data it downloads must be in XML.
Further investigating reveals that you can download results data from URLs like http://news.bbc.co.uk/1/shared/vote2005/flash_map/resultdata/200.xml. So I wrote a script to download these and convert them to XML. May I present a list of all the new MPs. Now, just need to wait for them to start talking and voting so we can build up a record of them. Meanwhile, on to some mySociety work; fixing up WriteToThem…
Praise be to Macromedia!
-
Slightly late; I was “hassle”d yesterday, but was at my school’s 21st anniversary, with Terry Waite, some other past students, all the current students, teachers past and present, and a lot of balloons. So I’ve been working some of today instead, which worked out quite well, as it was beautifully sunny yesterday and pouring with rain today.
I fixed a number of bugs in various places, including one that meant all pledges would expire a day early and various problems with the reporting abuse process. I also renamed NotApathetic’s “best of” page to “busiest”, as they’re not the same thing. 😉 The PledgeBank RSS feed of new pledges will hopefully be available from the live site soon (I’ve added the HTML to make the little orange icon appear in Firefox) – if you can’t wait until then.
I think Tom wants me to work next on user-defined flyers, which will involve adding to the poster generation code all the code we removed when we moved from using text in the POST to fetching it from the database. 🙂 Not sure of the details involved, so await direction. Looks like it might involve learning RTF generation in Python, though; hope that’s possible…
-
PledgeBank is now well and truly in testing, so we’re spending lots of time finding people to help us test it. One idea Chris just had is to try out a money pledge, but a simple one which doesn’t need any more code. For example “I’ll pay £10 towards a £750 cow at sendacow.co.uk if 74 other generous people will too”.
We’re thinking of ideas for what charity would be suitable, add your own to PledgeBankPossibleUses on the wiki. It needs to be a capital project, that you couldn’t give less towards and it make sense, so that the pledge aspect works. Of course, if you’ve got any other ideas for using PledgeBank, add them there as well, or email us.
Today I’m working my way through bugs. Right now I’m improving the error messages when confirmation tokens are bad, or when they are activated twice. There are several places this needs checking – for pledge creation, signing and for announce messages.
-
Well, today I’ve done something that will never be viewable to almost all of you. It’s a log in the admin interface of everything that’s going on with regards to PledgeBank, who’s signed up to what, what emails have been queued, and so on. It’s quite simple, simply fetching data from a few database tables in reverse date order, but I like it, and it’s proved handy already.
Also today, I’ve added a two month deadline limit to new pledges, searching of names as well as pledges, and made user error handling a bit nicer – try submitting a new pledge with missing entries (well, when it gets deployed to the live site, anyway). And at the moment I’m playing with putting a small pledge PNG (generated from an A7 landscape PDF) somewhere on the View Pledge page. Not sure where it can go, though.
-
So, another day, another new version of the PledgeBank source code live on http://www.pledgebank.org/. Actually, the changes are mostly underneath the surface, so you shouldn’t notice any specific differences, unless we’ve broken something, in which case whinge to team@pledgebank.com, as usual. That said, the new posters and SMS signup to pledges are now live, so you should now go out into the world and do Good Virtuous Stuff with them.
This is supposed to be the developers’ ‘blog, so a couple of technical things which have annoyed me today. (presumably you all read my actual web log and therefore expect me to write about things that annoy me):
- You can’t use a variable quantity in a limit clause in a subselect in PostgreSQL. “A what? In a what? What?“, I hear you cry. Well, this did come up in real life. When I was upgrading the PledgeBank code, there were various changes to the database schema which had to be made first. One of them was to change the way that the success of a pledge (i.e., what happens when it reaches its target) is recorded. Previously we had two boolean columns, like this:
create table pledges ( -- obviously SQL tables should have singular names, but -- in this case nobody asked me... -- ... success boolean not null default false, -- indicates that the pledge has succeeded completionnotified boolean not null default false, -- indicates that the creator and signers have been -- told that the pledge succeeded -- ... );Now, this is messy and not enough to describe how the site actually works. Specifically, there are some types of messages which should be sent to creators and signers, some which should be sent only to signers who signed before the pledge succeeded (in between success and the deadline, you can still sign the pledge), some which should only be sent to non-SMS recipients, etc. etc. So instead we now have a table of messages with flags indicating where they should go to and so forth. A side-effect of this is that the above structure is replaced with this:
create table pledges ( -- ... whensucceeded timestamp, -- indicates when pledge succeeded -- ... );Now, there are Real Pledges on the live site, so unlike the development site we can’t just drop the database in an update; instead, we have to port all the data over to the new data model. So what you’d like to write is, obviously,
begin work; alter table pledges add column whensucceeded timestamp; update pledges set whensucceeded = ( select signtime from signers where pledge_id = pledges.id order by signtime limit 1 offset pledges.target ) where success; alter table pledges drop column success; commit work; -- I love using a proper database!Sadly, you can’t. The two arguments in the limit statement in the subselect have to be constant. (No, I don’t know if/where this is documented. I, uh, asked on IRC.) This sucks. In the end I just set whensucceeded to the current time for currently-successful pledges; it’s not right, but it’ll do.
- Python (2.3, on FreeBSD) either sets O_NONBLOCK on sockets by default, or fails to clear it when creating a socket. Result: program crashes with EAGAIN down in the FastCGI library every so often. Outstanding!
Poll: Should we turn on comments on this ‘blog? (Does anyone read it, anyway?) Mail me at chris@mysociety.org with your answers….
- You can’t use a variable quantity in a limit clause in a subselect in PostgreSQL. “A what? In a what? What?“, I hear you cry. Well, this did come up in real life. When I was upgrading the PledgeBank code, there were various changes to the database schema which had to be made first. One of them was to change the way that the success of a pledge (i.e., what happens when it reaches its target) is recorded. Previously we had two boolean columns, like this:
-
Just as I was getting on with something else, I am called hither to write a blog post. So this’ll be a short one, I’m afraid. And instead of talking about what I was doing today (fixing bugs, modifying database schemas, and other dull interludes in the programming life) I’ll draw your attention to a couple of NotApathetic things.
NotApathetic has (notwithstanding various teething troubles) been going pretty well, and there are some more cool ideas in the pipeline. For those of you who are enjoying the latest web-logger fad, “tags“, Matthew Somerville has implemented this experimental page generated from users’ confessions reasons for not voting. It’s rather fun; since, unlike the “‘blogosphere”, mySociety isn’t about throwing away all the lessons of the last forty years of research in information retrieval, we generate tags automatically rather than expecting users to annotate their posts with the most meaningful set we can manage. (They wouldn’t.) Hopefully this or something like it will become part of the site’s front page soon.
And something which I want to implement — later this week, hopefully — is a voting system to collect two types of data: firstly, which posts people think are particularly interesting, so that we can give them more prominence on the front page; and secondly, given two example posts, how similar (in some general sense) the two reasons are. The point of the second one is that, given a set of similarity data, we should be able to cluster and categorise the country’s apathy and better understand it (and, of course, draw beautiful pictures of it in gnuplot, too). You might not think that sounds like fun, but I do.