1. WriteToThem maintaining itself

    Things have been quiet here recently, but are now getting busy again. Tom’s back from America, Chris is back from holiday, I’m better after being ill for most of last week.

    Earlier in the week we finally managed to load new county boundaries into MaPit. So WriteToThem once again has county councils working. Please try it out with your postcode. Let us know of any problems.

    This required lots of work from Chris, because a new version of BoundaryLine (from Ordnance Survey) has not yet been released with the updated boundaries. He’s done it using lists of the district council wards which make up the county electoral divisions.
    These lists were taken from the Statutory Instruments. This has covered most postcodes, but there are still some where the boundaries were specificed in text (walk along this river etc.) rather than wards. And we don’t have those.

    The last couple of days I’ve been turning on lots of things to automate updating of WriteToThem. A cron job now grabs new data on councillors from GovEval once a day, and merges their changes with any changes we’ve made.

    It’s automatically emailing GovEval with user submitted corrections to councillor data (the “Have you spotted a mistake in the above list?” link on WriteToThem). Hopefully this will create a virtuous feedback loop of ever improving data quality goodness. Or at least let us keep up with council by-elections without having to lift a finger.

    Finally I’ve made it send a mail once a week to the mailing list where WriteToThem admins (mostly volunteers) hang out. This describes what needs doing – such as missing contact details to gather, or messages in the queue which need human attention.

    Next up, wiring up the new screenscrapers Richard and Jonathan contributed last week, so the Welsh and London Assemblies automatically update…

  2. Electoral geography again

    So, it’s back to electoral geography for me, this time to get the new county and county electoral-division boundaries live on WriteToThem. This is a prerequisite for getting mail to county councillors working again after the election on May 5th, so we’re already three months behind the times. But more generally, electoral boundaries are revised all the time to account for changes of population within each ward, constituency and so forth; and at most (local and national) elections some set of boundary changes takes effect. So to keep WriteToThem running we need to incorporate such updates routinely.

    The way we handle electoral geography in general is to start with Ordnance Survey’s Boundary Line product, which, for each administrative or electoral area in Great Britain gives a polygon identifying that region. We then take a big list of all the postcodes in Britain (CodePoint) and figure out which polygons they lie in. Then when somebody comes along to WriteToThem and types in their postcode, we can figure out which ward, constituency etc. they are in, and tell them appropriate things about their representatives. (Technically this is a lie, of course, because postcodes represent regions, not points — we use the centroids of those regions — and each such region isn’t guaranteed to lie either wholly within or without all electoral and administrative regions. Unfortunately there isn’t a lot we can do about this beyond throwing our hands up and saying “oops, sorry”, so that’s what we do.)

    As an aside, outside Great Britain — that is, in Northern Ireland, we don’t have the same sort of data so instead we rely on another field in the CodePoint data which gives, for each postcode centroid, the ONS ward code for the ward in which that point lies. From that ward code you can find the enclosing local authority area, local electoral area — in Northern Ireland local councils are elected by STV over multimember regions, rather than by first-past-the-post as in Great Britain — and constituency. Happily it turns out that all of those other regions are composed of whole numbers of wards; this happy state of affairs does not necessarily prevail elsewhere.

    Now, twice a year, a new edition of Boundary Line is issued, taking account of recent changes in electoral geography. Usually this happens in May and October, though the schedule has been known to slip. In principle this should be easy to deal with: load up the new copy of Boundary Line, pass all the postcodes through it, and hey presto.

    Life, of course, is rarely that simple, and this isn’t one of those occasions. When the boundaries of a region don’t change between one year and the next, we don’t want to make any alteration to that region in our database (which uses ID numbers to identify each area). More specifically, when a new revision of Boundary Line comes along, we want to ensure that — let’s say — Cambridge Constituency in the new revision is identified with Cambridge Constituency in the old version. Now, in principle, this should be easy, because each area in the data set, in the words of the manual,

    … carries a unique identifier AI; this is the same identifier that was supplied in the previous specification of Boundary- Line. The same AI attribute is associated with every component polygon forming part of an administrative unit, irrespective of the number of polygons.

    Now, the first time that we did this, we worked from a copy of the Boundary Line data supplied in the form of “ShapeFiles” (a format used in various proprietary GIS systems, and with which our local government partners were able to supply us without having to order it specially from Ordnance Survey). Unfortunately in the ShapeFile version, the allegedly unique administrative area IDs were, in fact, not unique. After discussion with Ordnance Survey it was concluded that this was a problem which affected the translation of the data from NTF (“National Transfer Format”, their own preferred format) into ShapeFile; and that the problem would be fixed in the next release.

    So, taking no chances, we decided we’d work from the NTF format in future, since that seems to be closer to the authoritative source of the data, and anyway the ShapeFile format isn’t at all well-documented (for instance, many of the field names for the metadata about each area differ from those described in the manual for Boundary Line). So I’ve written code to parse the (slightly bonkers, natch) NTF files and modified our import scripts to use this code, with a view to then being able to keep up-to-date with future boundary revisions without too much trouble.

    You will not be surprised, therefore, to hear that this has not worked out exactly as planned. Unfortunately it appears that the May 2005 NTF release of Boundary Line suffers exactly the same problems of non-uniqueness as did the previous ShapeFile release. So unless some cleverer solution presents itself, I’ll have to revive the hack we intended to use with the ShapeFile data — try to construct unique IDs for areas from their geometry, and hope that the exact coordinates of the polygon vertices for unchanged areas do not change between revisions. We shall see. But right now I’m mostly worrying about why my parser script runs out of memory on my 1GB computer after reading a couple of hundred megs of input data.

  3. Changes are in the works

    Well, I’m back from my holiday, suitably sunburned and (relatively) relaxed. As Francis mentions, I was off in the Mediterranean somewhere (Majorca, specifically) suffering from miserable internet withdrawal symptoms. I did manage to get IRC up-and-running over dialup for election night, though this turned out to be surprisingly expensive. For once I was grateful to my iBook, which did actually Just Work when plugged into the wall.

    Anyway, today’s job is sorting out the new Scottish constituency boundaries. Scotland’s Parliament was dissolved in 1707 on the passing of the Act of Union, to be reconstituted in 1999. The quid pro quo for the Scots was enhanced representation in the House of Commons; Scottish constituencies had, in 1998, an average of 55,000 electors, compared to 69,000 in England. This anomaly has now been corrected, reducing the number of constituencies in Scotland from 72 to 59; all but three of the latter have different boundaries.

    This means updating MaPit, the component we built to map postcodes into electoral geography, to deal with the new boundaries. Ideally the way that we’d do this is to wait for Ordnance Survey to ship us, via our friends in ODPM, the new revision of their Boundary-Line (TM, apparently) product, with the outlines of the new constituencies encoded in attractive machine-readable form, and feed it to our existing import scripts. (As so often in life, it’s not quite that simple, but you get the general idea.) In an ideal world, this would also contain all the changed boundaries of the English counties and their constituent county electoral divisions.

    However, this is not an ideal world, and though there is a new revision of Boundary-Line in the works, it hasn’t come out yet, so we have to construct the point-to-constituency mapping in some other way. Happily, at this stage of the boundary revision process, the constituency boundaries are coterminous with ward boundaries, so it’s possible to just lift the definitions of the new constituencies from the relevant Statutory Instrument and fix up the constituencies from the ward boundaries, which haven’t changed. This, sadly, has occasioned a bit of a hack to our code, because we generally don’t assume that electoral geography is hierarchically defined — because it isn’t.

    (I don’t feel too bad about committing this hack, actually, because we’re likely to chuck the whole MaPit database and reconstruct it later in the year from OS data. When we built it originally, we did so from data in ESRI shapefile format; unfortunately, OS stuffed up the process of generating this from their own, internal and quite bonkers, NTF format, so the various area ID numbers in the database are not unique and not expected to be stable. We’d rather like stable ID numbers, so that we can cope gracefully with revisions to geography while maintaining continuity of, for instance, statistical data about MPs, so next time round we’re going to work from the NTF instead.)

    Sadly this Scottish hack doesn’t get us anywhere with the new county boundaries, and OS have told us that not all of the updated counties will be included in the forthcoming Boundary-Line revision. So it’ll be back to the tedious conversion of statutory instruments into SQL at some point in the near future, except that we’ll probably have to start building things up from parishes, rather than wards. Expect more anguished posts on this in the future.

    Meanwhile, Francis and Tom are collecting names and contact details for the new MPs. Tom tells me that this intake looks much more tech-savvy than the last, which could be good news from our (and everyone else’s) point of view. Hopefully WriteToThem will be cranking back into action — as far as MPs go, at least — fairly soon.