Much of what we do here at mySociety relies on Open Data, so naturally we support Open Data Day. In case you haven’t come across this event before, here’s the low-down:
Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption open data policies by the world’s local, regional and national governments.
If you’re planning on being a part of Open Data Day, you may find some of mySociety’s feeds, tools and APIs useful. This post attempts to put them all in one place. (more…)
All of us at mySociety love the fact that there are so many interesting new civic and democratic websites and apps springing up across the whole world. And we’re really keen to do what we can to help lower the barriers for people trying to build successful sites, to help citizens everywhere.
Today mySociety is unveiling MapIt Global, a new Component designed to eliminate one common, time-consuming task that civic software hackers everwhere have to struggle with: the task of identifying which political or administrative areas cover which parts of the planet.
As a general user this sort of thing might seem a bit obscure, but you’ve probably indirectly used such a service many times. So, for example, if you use our WriteToThem.com to write to a politician, you type in your postcode and the site will tell you who your politicians are. But this website can only do this because it knows that your postcode is located inside a particular council, or constituency or region.
Today, with the launch of MapIt Global , we are opening up a boundaries lookup service that works across the whole world. So now you can lookup a random point in Russia or Haiti or South Africa and find out about the administrative boundaries that surround it. And you can browse and inspect the shapes of administrative areas large and small, and perform sophisticated lookups like “Which areas does this one border with?”. And all this data is available both through an easy to use API, and a nice user interface.
We hope that MapIt Global will be used by coders and citizens worldwide to help them in ways we can’t even imagine yet. Our own immediate use case is to use it to make installations of the FixMyStreet Platform much easier.
We’re able to offer this service only because of the fantastic data made available by the amazing OpenStreetMap volunteer community, who are constantly labouring to make an ever-improving map of the whole world. You guys are amazing, and I hope that you find MapIt Global to be useful to your own projects.
The developers who made it possible were Mark Longair, Matthew Somerville and designer Jedidiah Broadbent. And, of course, we’re also only able to do this because the Omidyar Network is supporting our efforts to help people around the world.
From Britain to the World
For the last few years we’ve been running a British version of the MapIt service to allow people running other websites and apps to work out what council or constituency covers a particular point – it’s been very well used. We’ve given this a lick of paint and it is being relaunched today, too.
MapIt Global is also the first of The Components, a series of interoperable data stores that mySociety will be building with friends across the globe. Ultimately our goal is to radically reduce the effort required to launch democracy, transparency and government-facing sites and apps everywhere.
If you’d like to install and run the open source software that powers MapIt on your own servers, that’s cool too – you can find it on Github.
About the Data
The data that we are using is from the OpenStreetMap project, and has been collected by thousands of different people. It is licensed for free use under their open license. Coverage varies substantially, but for a great many countries the coverage is fantastic.
The brilliant thing about using OpenStreetMap data is that if you find that the boundary you need isn’t included, you can upload or draw it direct into Open Street Map, and it will subsequently be pulled into MapIt Global. We are planning to update our database about four times a year, but if you need boundaries adding faster, please talk to us.
If you’re interested in the technical aspects of how we built MapIt Global, see this blog post from Mark Longair.
Commercial Licenses and Local Copies
MapIt Global and UK are both based on open source software, which is available for free download. However, we charge a license fee for commercial usage of the API, and can also set up custom installs on virtual servers that you can own. Please drop us a line for any questions relating to commercial use.
We’ve added a variety of new features to our postcode and point administrative area database, MaPit, in the past month – new data (Super Output Areas and Crown dependency postcodes), new functionality (more geographic functions, council shortcuts, and JSONP callback), and most interestingly for most people, a way of browsing all the data on the site.
- Firstly, we have some new geographic functions to join touches – overlaps, covered, covers, and coverlaps. These do as you would expect, enabling you to see the areas that overlap, cover, or are covered by a particular area, optionally restricted to particular types of area. ‘coverlaps’ returns the areas either overlapped or covered by a chosen area – this might be useful for questions such as “Tell me all the Parliamentary constituencies fully or partly within the boundary of Manchester City Council” (three of those are entirely covered by the council, and two overlap another council, Salford or Trafford).
- As you can see from that link, nearly everything on MaPit now has an HTML representation – just stick “.html” on the end of a JSON URI to see it. This makes it very easy to explore the data contained within MaPit, linking areas together and letting you view any area on Google Maps (e.g. Rutland Council on a map). It also means every postcode has a page.
- From a discussion on our mailing list started by Paul Waring, we discovered that the NSPD – already used by us for Northern Ireland postcodes – also contains Crown dependency postcodes (the Channel Islands and the Isle of Man) – no location information is included, but it does mean that given something that looks like a Crown dependency postcode, we can now at least tell you if it’s a valid postcode or not for those areas.
- Next, we now have all Lower and Middle Super Output Areas in the system; thanks go to our volunteer Anna for getting the CD and writing the import script. These are provided by ONS for small area statistics after the 2001 census, and it’s great that you can now trivially look up the SOA for a postcode, or see what SOAs are within a particular ward. Two areas are in MaPit for each LSOA and MSOA – one has a less accurate boundary than the other for quicker plotting, and we thought we might as well just load it all in. The licences on the CD (Conditions of supply of SOA boundaries and Ordnance Survey Output Area Licence) talk about a click-use licence, and a not very sraightforward OS licence covering only those SOAs that might share part of a boundary with Boundary-Line (whichever ones those are), but ONS now use the Open Government Licence, Boundary-Line is included in OS OpenData, various councils have published their SOAs as open data (e.g. Warwickshire), and these areas should be publicly available under the same licences.
- As the UK has a variety of different types of council, depending on where exactly you are, the postcode lookup now includes a shortcuts dictionary in its result, with two keys, “council” and “ward”. In one-tier areas, the values will simply by the IDs of that postcode’s council and ward (whether it’s a Metropolitan district, Unitary authority, London borough, or whatever); in two-tier areas, the values will again be dictionaries with keys “district” and “council”, pointing at the respective IDs. This should hopefully make lookups of councils easier.
Phew! I hope you find this a useful resource for getting at administrative geographic data; please do let us know of any uses you make of the site.
I’m very pleased to announce that mySociety’s upgraded point and postcode lookup service, MaPit, is public and available to all. It can tell you about administrative areas, such as councils, Welsh Assembly constituencies, or civil parishes, by various different lookups including name, point, or postcode. It has a number of features not available elsewhere as far as I know, including:
- Full Northern Ireland coverage – we found a free and open dataset from the Office of National Statistics, called NSPD Open, available for a £200 data supply charge. We’ve paid that and uploaded it to our data mirror under the terms of the licence, so you don’t have to pay – if you feel like contributing to the charity that runs mySociety to cover our costs in this, please donate!
- Actual boundaries – for any specific area, you can get the co-ordinates of the boundary in either KML, JSON, or WKT – be warned, some can be rather big!
- Point lookup – given a point, in any geometry PostGIS knows about, it can tell you about all the areas containing that point, from parish and ward up to European electoral region.
- History – large scale boundary changes will be stored as new areas; as of now, this means the site contains the Westminster constituency boundaries from both before and after the 2010 general election, queryable just like current areas.
If you wish to use our service commercially or are considering high-volume usage, please get in touch to discuss options; the data and source code are available under their respective licences from the site. I hope this service may prove useful – we will slowly be migrating our own sites to use this service (FixMyStreet has already been done and already seems a bit nippier), so it should hopefully be quite reliable.
Thanks must go to the bodies releasing this open data that we can build upon and provide these useful services, and everyone involved in working towards the release of the data. Thanks also to everyone behind GeoDjango and PostGIS, making working with polygons and shapefiles a much nicer experience than it was back in 2004.
We updated our boundary and postcode database at the start of the week (apart from two wards in Scotland that I misspelled and updated on Tuesday, sorry), so hopefully everyone in the country can contact their representatives at WriteToThem or have their postcode recognised on HearFromYourMP or TheyWorkForYou. This applies especially to a small number of councils, such as Bradford, for which the boundaries had completely changed at their last election and which we were unable to get working until now – apologies for the inconvenience.
Related to this, and for interest, on 1st April, a number of councils are being abolished as their county councils become unitary authorities. The district councils within Durham, Northumberland, Cornwall, Wiltshire, Shropshire, and Cheshire/Chester all disappear – Cheshire becomes two unitary authorities called Cheshire West and Chester, and Cheshire East. Lastly, Bedford borough council becomes a unitary authority, and Central Bedfordshire council covers the area previously covered by Mid Bedfordshire and South Bedfordshire.
Parliamentary boundaries in England and Northern Ireland are changing, but these do not take effect until the next general election – until then, your constituency and MP remains the same.
FixMyStreet has a lot of RSS feeds. There’s one for every one-tier council (170), one for every ward of every one-tier council (another 5,044), two for every two-tier (county and district) council (544), and two for every ward of every two-tier council (20,296) – two per two-tier council because you might want either problems reported to one council of a two-tier set-up in particular, or all reports within the council’s boundary.
Then there’s an RSS feed every 162m across Great Britain in a big grid, returning all reports within a radius of that point, the radius by default being automatically determined by that point’s population density, but customisable to any distance if preferred. That’s, at a very rough approximation assuming Great Britain is a rectangle around its extremities, which it’s not, 19 million RSS feeds, lots of which will admittedly be very similar.
Every single one of those feeds can be subscribed to by email instead if that’s preferable to you, and are all accessible through a simple interface at http://www.fixmystreet.com/alert.
However, none of these RSS feeds was suitable for the person who emailed from a Neighbourhood Watch site and said that all they had was a postcode and they wanted to display a feed of reports from FixMyStreet. Given you could obviously look up a FixMyStreet map by postcode, it did seem odd that I hadn’t used the same code for the RSS feeds. Shortly thereafter, this anomaly was fixed, and if you now go to a URL of the form http://www.fixmystreet.com/rss/pc/postcode you will be redirected to the appropriate local reports feed for that postcode (I could say that adds another 1.7 million RSS feeds to our lot, but given they’re only redirects, that’s not strictly true). And after a couple more emails, I also added pubDate fields to the feeds which should make displaying in date order easier.
It’s great to see our RSS feeds being used by other sites – other examples I’ve recently come across include Brent Council integrating FixMyStreet into their mapping portal (select Streets, then FixMyStreet), or the Albert Square and St Stephen’s Association listing the most recent Stockwell problems in their blog sidebar. If you’ve seen any notable examples, do leave them in the comments.
If you enter your postcode on TheyWorkForYou and it’s Scottish or Northern Irish, you’re now presented with your MSPs and MLAs as well as your MP, which makes sense given the site covers their Parliament and Assembly respectively. You also get an extra tab in the navigation linking through to Your MSPs or MLAs. In order to do this, I needed a quick way of determining if a postcode was Northern Irish or Scottish. Northern Ireland was easy, as all postcodes there begin with BT. I assumed Scotland was also easy, which turned out to be true apart from the TD postcode area that straddled the border like a mail-sorting Niagara Falls. After some very dull investigation, I eventually worked out that e.g. most of TD15 is in England, but (amongst others) TD15 1X* is in Scotland, except for TD15 1XX which is apparently back in England. The final result was the postcode_is_scottish() function in postcode.inc, which (hopefully) correctly determines if a given postcode is Scottish or not – perhaps someone else will find it useful.
Whereas new sites are lovely, and I talk about Neighbourhood Fix-It improvements further down, there’s still quite a bit of work that needs to go into making sure our current sites are always up-to-date, working, and full of the joys of spring. Here’s a bit of what I’ve been up to recently, whilst everyone else chats about database upgrades, server memory, and statistics.
The elections last week meant much of WriteToThem has had to be switched off until we can add the new election results – that means the following aren’t currently contactable: the Scottish Parliament; the Welsh Assembly; every English metropolitan borough, unitary authority, and district council (bar seven); and every Scottish council. The fact that the electoral geography has changed a lot in Wales means there will almost certainly be complicated shenanigans for us in the near future so that our postcode lookup continues to return the correct results as much as possible.
Talking of postcode lookups, I also noticed yesterday that some Northern Ireland postcodes were returning incorrect results, which was caused by some out of date entries left lying around in our MaPit postcode-to-area database. Soon purged, but that led me to spot that Gerry Adams had been deleted from our database! Odd, I thought, and tracked it down to the fact our internal CSV file of MLAs had lost its header line, and so poor Mr Adams was heroically taking its place. He should be back now.
A Catalan news article about PledgeBank brought a couple of requests for new countries to be added to our list on PledgeBank. We’re sticking to the ISO 3166-1 list of country codes, but the requests led us to spot that Jersey, Guernsey and the Isle of Man had been given full entry status in that list and so needed added to our own. I’m hoping the interest will lead to a Catalan translation of the site; we should hopefully also have Chinese and Belarussian soon, which will be great.
Neighbourhood Fix-It update
New features are still being added to Neighbourhood Fix-It.
Questionnaires are now being sent out to people who create problems four weeks after their problem is sent to the council, asking them to check the status of their problem and thereby keep the site up-to-date. Adding the questionnaire functionality threw up a number of bugs elsewhere – the worst of which was that we would be sending email alerts to people whether their alert had been confirmed or not. Thankfully, there hadn’t yet been any such alert, phew.
Lastly, the Fix-It RSS feeds now have GeoRSS too, which means you can easily plot them on a Google map.
So, a silly post for today: Postcodeine. This is a British version of Ben Fry’s zipdecode, a “tool” for visualising the distribution of zipcodes in the United States. This is, as has been pointed out to me, wholly pointless, but it’s quite fun and writing it was an interesting exercise (it also taught me a little bit about AJAX, the web’s technology trend du jour). If you want the source code, it’s at the foot here; licence is the Affero GPL, as for all the other mySociety code.
(I should say, by the way, that I wrote this in my copious spare time. It’s copyright mySociety because I don’t have the right to use the postcode database myself.)
… or, “how near is ‘nearby’?”
On PledgeBank we offer search and local alert features which will tell users about pledges which have been set up near them, the idea being that if somebody’s organising a street party in the next street over, you might well want to hear about it, but if it’s somebody a thousand miles away, you probably don’t.
At the moment we do this by gathering location data from pledge creators (either using postcodes, or location names via Gaze), and comparing it to search / alert locations using a fixed distance threshold — presently 20km (or about 12 miles). This works moderately well, but leads to complaints from Londoners of the form “why have I been sent a pledge which is TEN MILES away from me?” — the point being that, within London, people’s idea of how far away “nearby” things is is quite different from that of people who live in the countryside — they mean one tube stop, or a few minutes’ walk, or whatever. If you live in the countryside, “nearby” might be the nearest village or even the nearest town.
So, ages ago we decided that the solution to this was to find some population density data and use it to produce an estimate for what is “nearby”, defined as, “the radius around a point which contains at least N people”. That should capture the difference between rural areas and small and large towns.
(In fairness, the London issue could be solved by having the code understand north vs south of the river as a special case, and never showing North-Londoners pledges in South London. But that’s just nasty.)
Unfortunately the better solution requires finding population density data for the whole world, which is troublesome. There seem to be two widely-used datasets with global coverage: NASA SEDAC’s Gridded Population of the World, and Oak Ridge National Laboratory’s Landscan database. GPW is built from census data and information about the boundaries of each administrative unit for which the census data is recorded, and Landscan improves on this by using remote-sensing data such as the distribution of night-time lights, transport networks and so forth.
(Why, you might wonder, is Oak Ridge National Laboratory interested in such a thing? It is, apparently, “for estimating ambient populations at risk” from natural disasters and whatnot. That’s very worthy, but I can’t help but wonder whether the original motivation for this sort of work may have been a touch more sinister. But what do I know?)
Anyway, licence terms seem to mean that we can use the GPW data and we can’t use the Landscan data, which is a pity, since the GPW data is only really very good in its coverage of rich western countries which produce very detailed census returns on, e.g., a per-municipality basis. Where census returns are only available on the level of regions, the results are less good. Anyway, subject to that caveat, it seems to solve the problem. Here’s a map showing a selection of points, and the circles around them which contain about 200,000 people (that seems to be about the right value for N):
The API to access this will go into the Gaze interface, but it’s not live yet. I’ll document the RESTful API when it is.
One last note, which might be of use to people working with the GPW data in the future. GPW is a cell-based grid: each cell is a region lying between two lines of longitude and two lines of latitude, and within each cell three variables are defined: the population in the cell, the population density of the cell, and the land area of the cell. (This is one of those rare exceptions described in to Alvy Ray Smith’s rant, A Pixel Is Not A Little Square….) But note that the land area is not the surface area of the cell, and the population density is not the population divided by the surface area of the cell!
This becomes important in the case of small islands; for instance (a case I hit debugging the code) the Scilly Isles. The quoted population density for the Scilly Isles is rather high: somewhere between 100 and 200 persons/km2, but when integrating the population density to find the total population in an area, this is absolutely not the right value to use: the proper value there is the total population of a cell, divided by its total surface area. The reason for that is that when sampling from the grid to find the value of the integrand (the population density) you don’t know, a priori, whether the point you’re sampling at has landed on land or non-land, but the quoted population density assumes that you are always asking about the land. When integrating, the total population of each cell should be “smeared out” over the whole area of the cell. If you don’t do this then you will get very considerable overestimates of the population in regions which contain islands.