1. All the places in the world

    Lots of countries gradually loading into one of our servers. There’s 220Mb of data comprising 227 countries, with about 5,000,000 places altogether. With a global population of about 6 billion, that means the average “place” has 1,200 people living in it. For each place we have the latitude and longitude. (All this data comes from the US military)

    Try it out by signing up for a local alert in any country. Let us know if you find any bugs, or have any problems or suggestions to make. Also, if you want access to this gazetteer as a web service, send us a mail.

    Currently it’s up to Uruguay, it’ll be a bit longer before we’ve finished the alphabet. It takes quite a while partly because of the volume of data and indices being built, partly because for places with the same name as each other it hunts for nearby towns to disambiguate, and partly because we didn’t optimise the perl script. It won’t run very often.

  2. Electoral geography again

    So, it’s back to electoral geography for me, this time to get the new county and county electoral-division boundaries live on WriteToThem. This is a prerequisite for getting mail to county councillors working again after the election on May 5th, so we’re already three months behind the times. But more generally, electoral boundaries are revised all the time to account for changes of population within each ward, constituency and so forth; and at most (local and national) elections some set of boundary changes takes effect. So to keep WriteToThem running we need to incorporate such updates routinely.

    The way we handle electoral geography in general is to start with Ordnance Survey’s Boundary Line product, which, for each administrative or electoral area in Great Britain gives a polygon identifying that region. We then take a big list of all the postcodes in Britain (CodePoint) and figure out which polygons they lie in. Then when somebody comes along to WriteToThem and types in their postcode, we can figure out which ward, constituency etc. they are in, and tell them appropriate things about their representatives. (Technically this is a lie, of course, because postcodes represent regions, not points — we use the centroids of those regions — and each such region isn’t guaranteed to lie either wholly within or without all electoral and administrative regions. Unfortunately there isn’t a lot we can do about this beyond throwing our hands up and saying “oops, sorry”, so that’s what we do.)

    As an aside, outside Great Britain — that is, in Northern Ireland, we don’t have the same sort of data so instead we rely on another field in the CodePoint data which gives, for each postcode centroid, the ONS ward code for the ward in which that point lies. From that ward code you can find the enclosing local authority area, local electoral area — in Northern Ireland local councils are elected by STV over multimember regions, rather than by first-past-the-post as in Great Britain — and constituency. Happily it turns out that all of those other regions are composed of whole numbers of wards; this happy state of affairs does not necessarily prevail elsewhere.

    Now, twice a year, a new edition of Boundary Line is issued, taking account of recent changes in electoral geography. Usually this happens in May and October, though the schedule has been known to slip. In principle this should be easy to deal with: load up the new copy of Boundary Line, pass all the postcodes through it, and hey presto.

    Life, of course, is rarely that simple, and this isn’t one of those occasions. When the boundaries of a region don’t change between one year and the next, we don’t want to make any alteration to that region in our database (which uses ID numbers to identify each area). More specifically, when a new revision of Boundary Line comes along, we want to ensure that — let’s say — Cambridge Constituency in the new revision is identified with Cambridge Constituency in the old version. Now, in principle, this should be easy, because each area in the data set, in the words of the manual,

    … carries a unique identifier AI; this is the same identifier that was supplied in the previous specification of Boundary- Line. The same AI attribute is associated with every component polygon forming part of an administrative unit, irrespective of the number of polygons.

    Now, the first time that we did this, we worked from a copy of the Boundary Line data supplied in the form of “ShapeFiles” (a format used in various proprietary GIS systems, and with which our local government partners were able to supply us without having to order it specially from Ordnance Survey). Unfortunately in the ShapeFile version, the allegedly unique administrative area IDs were, in fact, not unique. After discussion with Ordnance Survey it was concluded that this was a problem which affected the translation of the data from NTF (“National Transfer Format”, their own preferred format) into ShapeFile; and that the problem would be fixed in the next release.

    So, taking no chances, we decided we’d work from the NTF format in future, since that seems to be closer to the authoritative source of the data, and anyway the ShapeFile format isn’t at all well-documented (for instance, many of the field names for the metadata about each area differ from those described in the manual for Boundary Line). So I’ve written code to parse the (slightly bonkers, natch) NTF files and modified our import scripts to use this code, with a view to then being able to keep up-to-date with future boundary revisions without too much trouble.

    You will not be surprised, therefore, to hear that this has not worked out exactly as planned. Unfortunately it appears that the May 2005 NTF release of Boundary Line suffers exactly the same problems of non-uniqueness as did the previous ShapeFile release. So unless some cleverer solution presents itself, I’ll have to revive the hack we intended to use with the ShapeFile data — try to construct unique IDs for areas from their geometry, and hope that the exact coordinates of the polygon vertices for unchanged areas do not change between revisions. We shall see. But right now I’m mostly worrying about why my parser script runs out of memory on my 1GB computer after reading a couple of hundred megs of input data.

  3. Tickety boo

    Today I’m fiddling with the pledge creation form again, to fit all the new location stuff Chris has done in better. We’ve swapped over from last week, like tag team wrestlers. Chris is busy getting the boundary line data for new County boundaries into WriteTothem.

  4. Start of Term

    The job for this week has been getting WriteToThem back up. We’ve now sorted out the Scottish boundary changes and the names of the new MPs, but we don’t have their contact details yet. Today, among sundry other bits and bobs, including debugging thorny Exim problems and other such uninteresting stuff, I’ve been sending faxes to the Parliamentary fax numbers we have for MPs who’ve been re-elected, asking them whether their details have changed. We’re doing that because we don’t know whether those MPs will be in the same offices in this Parliament as in the last, and obviously it would be a breach of trust to send constituents’ mail to the wrong offices. So, I’ve finally reached rock bottom: Chris Lightfoot, junk fax merchant. Sorry, everyone!

    (Previous “rock bottom” moment: being telephoned early in the morning by a Labour MP with a rather cut-glass accent. She was calling to find out she had been receiving phone calls from our fax machine — we had the number wrong — at her office. But she didn’t tell me who she was, and, because over the phone she sounded uncannily like an old friend of mine, I assumed that it was my friend on the line. She must have assumed that she knew who I was too, and we exchanged several rounds of pleasantries before comprehension gradually dawned: “This is Mrs — —, MP; who am I speaking to?” Come to think of it, she may even have asked “to whom am I speaking?”.)

  5. Changes are in the works

    Well, I’m back from my holiday, suitably sunburned and (relatively) relaxed. As Francis mentions, I was off in the Mediterranean somewhere (Majorca, specifically) suffering from miserable internet withdrawal symptoms. I did manage to get IRC up-and-running over dialup for election night, though this turned out to be surprisingly expensive. For once I was grateful to my iBook, which did actually Just Work when plugged into the wall.

    Anyway, today’s job is sorting out the new Scottish constituency boundaries. Scotland’s Parliament was dissolved in 1707 on the passing of the Act of Union, to be reconstituted in 1999. The quid pro quo for the Scots was enhanced representation in the House of Commons; Scottish constituencies had, in 1998, an average of 55,000 electors, compared to 69,000 in England. This anomaly has now been corrected, reducing the number of constituencies in Scotland from 72 to 59; all but three of the latter have different boundaries.

    This means updating MaPit, the component we built to map postcodes into electoral geography, to deal with the new boundaries. Ideally the way that we’d do this is to wait for Ordnance Survey to ship us, via our friends in ODPM, the new revision of their Boundary-Line (TM, apparently) product, with the outlines of the new constituencies encoded in attractive machine-readable form, and feed it to our existing import scripts. (As so often in life, it’s not quite that simple, but you get the general idea.) In an ideal world, this would also contain all the changed boundaries of the English counties and their constituent county electoral divisions.

    However, this is not an ideal world, and though there is a new revision of Boundary-Line in the works, it hasn’t come out yet, so we have to construct the point-to-constituency mapping in some other way. Happily, at this stage of the boundary revision process, the constituency boundaries are coterminous with ward boundaries, so it’s possible to just lift the definitions of the new constituencies from the relevant Statutory Instrument and fix up the constituencies from the ward boundaries, which haven’t changed. This, sadly, has occasioned a bit of a hack to our code, because we generally don’t assume that electoral geography is hierarchically defined — because it isn’t.

    (I don’t feel too bad about committing this hack, actually, because we’re likely to chuck the whole MaPit database and reconstruct it later in the year from OS data. When we built it originally, we did so from data in ESRI shapefile format; unfortunately, OS stuffed up the process of generating this from their own, internal and quite bonkers, NTF format, so the various area ID numbers in the database are not unique and not expected to be stable. We’d rather like stable ID numbers, so that we can cope gracefully with revisions to geography while maintaining continuity of, for instance, statistical data about MPs, so next time round we’re going to work from the NTF instead.)

    Sadly this Scottish hack doesn’t get us anywhere with the new county boundaries, and OS have told us that not all of the updated counties will be included in the forthcoming Boundary-Line revision. So it’ll be back to the tedious conversion of statutory instruments into SQL at some point in the near future, except that we’ll probably have to start building things up from parishes, rather than wards. Expect more anguished posts on this in the future.

    Meanwhile, Francis and Tom are collecting names and contact details for the new MPs. Tom tells me that this intake looks much more tech-savvy than the last, which could be good news from our (and everyone else’s) point of view. Hopefully WriteToThem will be cranking back into action — as far as MPs go, at least — fairly soon.