-
Things have been quiet here recently, but are now getting busy again. Tom’s back from America, Chris is back from holiday, I’m better after being ill for most of last week.
Earlier in the week we finally managed to load new county boundaries into MaPit. So WriteToThem once again has county councils working. Please try it out with your postcode. Let us know of any problems.
This required lots of work from Chris, because a new version of BoundaryLine (from Ordnance Survey) has not yet been released with the updated boundaries. He’s done it using lists of the district council wards which make up the county electoral divisions.
These lists were taken from the Statutory Instruments. This has covered most postcodes, but there are still some where the boundaries were specificed in text (walk along this river etc.) rather than wards. And we don’t have those.The last couple of days I’ve been turning on lots of things to automate updating of WriteToThem. A cron job now grabs new data on councillors from GovEval once a day, and merges their changes with any changes we’ve made.
It’s automatically emailing GovEval with user submitted corrections to councillor data (the “Have you spotted a mistake in the above list?” link on WriteToThem). Hopefully this will create a virtuous feedback loop of ever improving data quality goodness. Or at least let us keep up with council by-elections without having to lift a finger.
Finally I’ve made it send a mail once a week to the mailing list where WriteToThem admins (mostly volunteers) hang out. This describes what needs doing – such as missing contact details to gather, or messages in the queue which need human attention.
Next up, wiring up the new screenscrapers Richard and Jonathan contributed last week, so the Welsh and London Assemblies automatically update…
-
So, it’s back to electoral geography for me, this time to get the new county and county electoral-division boundaries live on WriteToThem. This is a prerequisite for getting mail to county councillors working again after the election on May 5th, so we’re already three months behind the times. But more generally, electoral boundaries are revised all the time to account for changes of population within each ward, constituency and so forth; and at most (local and national) elections some set of boundary changes takes effect. So to keep WriteToThem running we need to incorporate such updates routinely.
The way we handle electoral geography in general is to start with Ordnance Survey’s Boundary Line product, which, for each administrative or electoral area in Great Britain gives a polygon identifying that region. We then take a big list of all the postcodes in Britain (CodePoint) and figure out which polygons they lie in. Then when somebody comes along to WriteToThem and types in their postcode, we can figure out which ward, constituency etc. they are in, and tell them appropriate things about their representatives. (Technically this is a lie, of course, because postcodes represent regions, not points — we use the centroids of those regions — and each such region isn’t guaranteed to lie either wholly within or without all electoral and administrative regions. Unfortunately there isn’t a lot we can do about this beyond throwing our hands up and saying “oops, sorry”, so that’s what we do.)
As an aside, outside Great Britain — that is, in Northern Ireland, we don’t have the same sort of data so instead we rely on another field in the CodePoint data which gives, for each postcode centroid, the ONS ward code for the ward in which that point lies. From that ward code you can find the enclosing local authority area, local electoral area — in Northern Ireland local councils are elected by STV over multimember regions, rather than by first-past-the-post as in Great Britain — and constituency. Happily it turns out that all of those other regions are composed of whole numbers of wards; this happy state of affairs does not necessarily prevail elsewhere.
Now, twice a year, a new edition of Boundary Line is issued, taking account of recent changes in electoral geography. Usually this happens in May and October, though the schedule has been known to slip. In principle this should be easy to deal with: load up the new copy of Boundary Line, pass all the postcodes through it, and hey presto.
Life, of course, is rarely that simple, and this isn’t one of those occasions. When the boundaries of a region don’t change between one year and the next, we don’t want to make any alteration to that region in our database (which uses ID numbers to identify each area). More specifically, when a new revision of Boundary Line comes along, we want to ensure that — let’s say — Cambridge Constituency in the new revision is identified with Cambridge Constituency in the old version. Now, in principle, this should be easy, because each area in the data set, in the words of the manual,
… carries a unique identifier AI; this is the same identifier that was supplied in the previous specification of Boundary- Line. The same AI attribute is associated with every component polygon forming part of an administrative unit, irrespective of the number of polygons.
Now, the first time that we did this, we worked from a copy of the Boundary Line data supplied in the form of “ShapeFiles” (a format used in various proprietary GIS systems, and with which our local government partners were able to supply us without having to order it specially from Ordnance Survey). Unfortunately in the ShapeFile version, the allegedly unique administrative area IDs were, in fact, not unique. After discussion with Ordnance Survey it was concluded that this was a problem which affected the translation of the data from NTF (“National Transfer Format”, their own preferred format) into ShapeFile; and that the problem would be fixed in the next release.
So, taking no chances, we decided we’d work from the NTF format in future, since that seems to be closer to the authoritative source of the data, and anyway the ShapeFile format isn’t at all well-documented (for instance, many of the field names for the metadata about each area differ from those described in the manual for Boundary Line). So I’ve written code to parse the (slightly bonkers, natch) NTF files and modified our import scripts to use this code, with a view to then being able to keep up-to-date with future boundary revisions without too much trouble.
You will not be surprised, therefore, to hear that this has not worked out exactly as planned. Unfortunately it appears that the May 2005 NTF release of Boundary Line suffers exactly the same problems of non-uniqueness as did the previous ShapeFile release. So unless some cleverer solution presents itself, I’ll have to revive the hack we intended to use with the ShapeFile data — try to construct unique IDs for areas from their geometry, and hope that the exact coordinates of the polygon vertices for unchanged areas do not change between revisions. We shall see. But right now I’m mostly worrying about why my parser script runs out of memory on my 1GB computer after reading a couple of hundred megs of input data.
-
We’ve just launched a testing version of FaxYourRepresentative. This is not a working site and not even a beta – because you cannot email representatives at the moment. What you can do, though, is practice sending messages – they’ll just be routed back to your own inbox so you can see that they’ve gone through.
We want people to try postcodes, give us feedback, and volunteer to help with the further development. FaxYourMP remains online and will do for some considerable time yet.