-
FixMyTransport is the most challenging project mySociety has ever tried to build. It’s so ambitious that we’re taking the unusual move of breaking off part of the problem and stress-testing it in the form of the new mini-site Brief Encounters, which has gone live today. It was built by Louise Crow, or Crowbot, as we know her, with design support from Dave Whiteland.
Brief Encounters is not, as the name might suggest, mySociety’s long awaited attempt at a dating site. Instead it’s a place where people can share whimsical stories about unusual things that happened them them, or other people, on public transport. We hope you’ll have a go, read some examples and then contribute your own.
You might be thinking that a whimsical story site doesn’t sound very mySocietyish – and you’d be right. Brief Encounters is actually a technology test-bed to help us crack a new design and data problem: how do you make it as easy as possible for users to pinpoint a specific bus stop, or train route, or a ferry port, as easily as possible? There are over 300,000 such beasties, and nobody has ever really tried to build an interface that makes it easy to find each one quickly and reliably.
So, what we want from you, dear readers, is three fold. We want:
- Stories – the more hilarious or sob-inducing the better
- Feedback on the user experience – how can we make finding a route or node easier?
- Feedback on any data problems you find, ie “My bus stop is missing” – we’re going to have to patch our data with your help, there’s just no other way
For those of you tech minded, the project is built in Ruby and uses the NaPTAN dataset of stations, bus stops and ferry terminals, the National Public Transport Gazetteer database of towns and settlements in the UK, and the National Public Transport Data Repository of sample public transport journeys, from 2008. The first two datasets are free of charge, and the third one mySociety pays for.
Lastly, kudos must go to the hyper-imaginative Nicky Getgood who suggested we collect stories on FixMyTransport, as well as problem reports.
-
… or, “how near is ‘nearby’?”
On PledgeBank we offer search and local alert features which will tell users about pledges which have been set up near them, the idea being that if somebody’s organising a street party in the next street over, you might well want to hear about it, but if it’s somebody a thousand miles away, you probably don’t.
At the moment we do this by gathering location data from pledge creators (either using postcodes, or location names via Gaze), and comparing it to search / alert locations using a fixed distance threshold — presently 20km (or about 12 miles). This works moderately well, but leads to complaints from Londoners of the form “why have I been sent a pledge which is TEN MILES away from me?” — the point being that, within London, people’s idea of how far away “nearby” things is is quite different from that of people who live in the countryside — they mean one tube stop, or a few minutes’ walk, or whatever. If you live in the countryside, “nearby” might be the nearest village or even the nearest town.
So, ages ago we decided that the solution to this was to find some population density data and use it to produce an estimate for what is “nearby”, defined as, “the radius around a point which contains at least N people”. That should capture the difference between rural areas and small and large towns.
(In fairness, the London issue could be solved by having the code understand north vs south of the river as a special case, and never showing North-Londoners pledges in South London. But that’s just nasty.)
Unfortunately the better solution requires finding population density data for the whole world, which is troublesome. There seem to be two widely-used datasets with global coverage: NASA SEDAC’s Gridded Population of the World, and Oak Ridge National Laboratory’s Landscan database. GPW is built from census data and information about the boundaries of each administrative unit for which the census data is recorded, and Landscan improves on this by using remote-sensing data such as the distribution of night-time lights, transport networks and so forth.
(Why, you might wonder, is Oak Ridge National Laboratory interested in such a thing? It is, apparently, “for estimating ambient populations at risk” from natural disasters and whatnot. That’s very worthy, but I can’t help but wonder whether the original motivation for this sort of work may have been a touch more sinister. But what do I know?)
Anyway, licence terms seem to mean that we can use the GPW data and we can’t use the Landscan data, which is a pity, since the GPW data is only really very good in its coverage of rich western countries which produce very detailed census returns on, e.g., a per-municipality basis. Where census returns are only available on the level of regions, the results are less good. Anyway, subject to that caveat, it seems to solve the problem. Here’s a map showing a selection of points, and the circles around them which contain about 200,000 people (that seems to be about the right value for N):
The API to access this will go into the Gaze interface, but it’s not live yet. I’ll document the RESTful API when it is.
One last note, which might be of use to people working with the GPW data in the future. GPW is a cell-based grid: each cell is a region lying between two lines of longitude and two lines of latitude, and within each cell three variables are defined: the population in the cell, the population density of the cell, and the land area of the cell. (This is one of those rare exceptions described in to Alvy Ray Smith’s rant, A Pixel Is Not A Little Square….) But note that the land area is not the surface area of the cell, and the population density is not the population divided by the surface area of the cell!
This becomes important in the case of small islands; for instance (a case I hit debugging the code) the Scilly Isles. The quoted population density for the Scilly Isles is rather high: somewhere between 100 and 200 persons/km2, but when integrating the population density to find the total population in an area, this is absolutely not the right value to use: the proper value there is the total population of a cell, divided by its total surface area. The reason for that is that when sampling from the grid to find the value of the integrand (the population density) you don’t know, a priori, whether the point you’re sampling at has landed on land or non-land, but the quoted population density assumes that you are always asking about the land. When integrating, the total population of each cell should be “smeared out” over the whole area of the cell. If you don’t do this then you will get very considerable overestimates of the population in regions which contain islands.
-
A very quick post to announce the launch of a public interface to our Gaze web gazetteer service. The motivation behind Gaze is collecting location information from users without using maps (a clunky approach with poor accessibility and licensing problems) or postcodes (which do not have universal coverage and have privacy issues as well as licensing problems). Instead the idea is to use place names to identify locations, even in the presence of ambiguity, alternate names, etc. We do this by providing a search service over a large gazetteer (2.2 million places and 3 million names), and supplying additional contextual information to disambiguate common place names. The API is very simple, with one major function and two other supporting ones.
Anyway, without further ado, here is the API. Internally we use one based on RABX, but we’ve done a special “RESTful” API for everyone else. All requests should be HTTP GETs; all parameters must be in UTF-8; and all responses are in UTF-8 plain text or comma-separated values. All calls should be passed to the URL,
http://gaze.mysociety.org/gaze-rest
selecting a particular function by specifying the HTTP parameter f, for instance
http://gaze.mysociety.org/gaze-rest?f=get_find_places_countries
Available functions are:
- get_country_from_ip
- Parameters:
- ip
- IPv4 address of a host, in dotted-quad format
Guess the country of location of a host from its IP address. The result of this call will be an ISO country code, followed by a line feed; or, if it was not possible to determine a country, a line feed on its own.
- get_find_places_countries
- No parameters.Return the list of countries for which the find_places call has a gazetteer available. The list is returned as a list of ISO country codes followed by line feeds.
- find_places
- Parameters:
- country
- ISO country code of country in which to search for places
- state
- state in which to search for places; presently this is only meaningful for country=US (United States), in which case it should be a conventional two-letter state code (AZ, CA, NY etc.); optional
- query
- query term input by the user; must be at least two characters long
- maxresults
- largest number of results to return, from 1 to 100 inclusive; optional; default 10
- minscore
- minimum match score of returned results, from 1 to 100 inclusive; optional; default 0
Returns in CSV format (as defined by this internet draft) with a one-line header a list of the following fields:
- name
- name of the place described by this row
- in-qualifier
- blank, or the name of an administrative region in which this place lies (for instance, a county)
- near-qualifier
- blank, or a list of nearby places, separated by commas
- latitude
- WGS-84 latitude of place in decimal degrees, north-positive
- longitude
- WGS-84 longitude of place in decimal degrees, east-positive
- state
- blank, or containing state code for US
- score
- match score for this place, from 0 to 100 inclusive
Enjoy! Questions and comments to hello@mysociety.org, please.
Update: we’ve now added the facilities for discovering population densities and “customary proximity” (as discussed in this post) to Gaze. The additional APIs are documented here.
-
Today I’m fiddling with the pledge creation form again, to fit all the new location stuff Chris has done in better. We’ve swapped over from last week, like tag team wrestlers. Chris is busy getting the boundary line data for new County boundaries into WriteTothem.
-
So, it’s my turn to write something here again. Ho-hum. Anyway, lately I’ve been working on adding the geographical features to PledgeBank which everyone thinks are there already: specifically, finding pledges which are nearby. To start with, we’re doing this for the UK only (because we already have the infrastructure to do postcode-to-coordinates lookups through MaPit), but the intention is to do thiis for the whole world as soon as we can, either by having users select their location through a gazetteer, or, where we can get the data, by using a similar postcode/zip-code/whatever-to-coordinates lookup. So that means we have to deal with places which might be anywhere on earth, which means dealing with latitudes and longitudes. And as anyone who’s dealt with this stuff knows, it’s very tedious to get this right. I’m afraid I’ve now spent too long reading about datum ellipsoids and Helmert Transforms to want to spend any time at all talking about them, so in the unlikely event that you’re interested, you’ll just have to read the (relevant parts of) code.
Oh, and if anybody has good suggestions for a hierarchical gazetteer with world-wide coverage, I’d love to hear them. Similarly, does anybody know if this one is any good?
-
I’m sitting here at nearly 11PM, forcing poor Matthew Somerville to work on polishing up the pledge pages before monday’s launch.
Anno Mitchell deserves a big thanks – today she pointed out something so blindingly obvious today that we were a bit remiss not to have built it already. She suggested that the PledgeBank homepage should include a facility to be emailed whenever someone creates a local pledge near you. This is such a great idea that elsewhere we’ve already built it. We were just a bit too dumb to notice that it should be included on our newest site 🙂
-
Today Chris and I have been doing more work and thinking about PledgeBank.
Chris is busily adding login. Not compulsory evil login, but a more unobtrusive sort. Hopefully. We want to reduce the number of emails with tokens that the site has to send, and also to add new features. For example, to highlight comments from the pledge creator, and allow him to send emails at will to the pledge signers. Later on maybe we’ll have photo uploads, to make the pledge pages more personal and engaging.
Meanwhile, I’ve been fiddling with the test harness (again), and now I’m reworking the new pledge form. Nobody looks at the second page, but we really want people to specify if their pledge is UK specific, and give a central location (by postcode, of course). This information will be used to help people find pledges, partly by syndication. We’re also adding categories (taken from iCan) for the same reason.
There’s another reason we want to know if pledges are local or not. If they are, then we can encourage people to print out flyers to leaflet people with. Otherwise they’re better off using the “email a friend” facility. The way pledges should be presented might vary quite a lot according to what they are for.