A very quick post to announce the launch of a public interface to our Gaze web gazetteer service. The motivation behind Gaze is collecting location information from users without using maps (a clunky approach with poor accessibility and licensing problems) or postcodes (which do not have universal coverage and have privacy issues as well as licensing problems). Instead the idea is to use place names to identify locations, even in the presence of ambiguity, alternate names, etc. We do this by providing a search service over a large gazetteer (2.2 million places and 3 million names), and supplying additional contextual information to disambiguate common place names. The API is very simple, with one major function and two other supporting ones.
Anyway, without further ado, here is the API. Internally we use one based on RABX, but we’ve done a special “RESTful” API for everyone else. All requests should be HTTP GETs; all parameters must be in UTF-8; and all responses are in UTF-8 plain text or comma-separated values. All calls should be passed to the URL,
http://gaze.mysociety.org/gaze-rest
selecting a particular function by specifying the HTTP parameter f, for instance
http://gaze.mysociety.org/gaze-rest?f=get_find_places_countries
Available functions are:
- get_country_from_ip
-
Parameters:
- ip
- IPv4 address of a host, in dotted-quad format
Guess the country of location of a host from its IP address. The result of this call will be an ISO country code, followed by a line feed; or, if it was not possible to determine a country, a line feed on its own.
- get_find_places_countries
-
No parameters.
Return the list of countries for which the find_places call has a gazetteer available. The list is returned as a list of ISO country codes followed by line feeds.
- find_places
-
Parameters:
- country
- ISO country code of country in which to search for places
- state
- state in which to search for places; presently this is only meaningful for country=US (United States), in which case it should be a conventional two-letter state code (AZ, CA, NY etc.); optional
- query
- query term input by the user; must be at least two characters long
- maxresults
- largest number of results to return, from 1 to 100 inclusive; optional; default 10
- minscore
- minimum match score of returned results, from 1 to 100 inclusive; optional; default 0
Returns in CSV format (as defined by this internet draft) with a one-line header a list of the following fields:
- name
- name of the place described by this row
- in-qualifier
- blank, or the name of an administrative region in which this place lies (for instance, a county)
- near-qualifier
- blank, or a list of nearby places, separated by commas
- latitude
- WGS-84 latitude of place in decimal degrees, north-positive
- longitude
- WGS-84 longitude of place in decimal degrees, east-positive
- state
- blank, or containing state code for US
- score
- match score for this place, from 0 to 100 inclusive
Enjoy! Questions and comments to chris@mysociety.org, please.
Update: we’ve now added the facilities for discovering population densities and “customary proximity” (as discussed in this post) to Gaze. The additional APIs are documented here.
RSS feed
September 28th, 2005 at 9:29pm
This is interesting, but CSV is not a nice format to parse easily – would you consider using a structured HTML response, like http://microformats.org/wiki/xoxo ?
Also, a missing, but useful bit of information with this result is a radius of interest for the named place, as if you are going to present results to users on a map-based UI at any point, knowing how far to zoom in is important.
September 30th, 2005 at 3:43pm
Out of interest Kevin, what are you using to parse with? I’d have thought CSV would be pretty easy – there are standard Perl and Python modules for it, for example. Parsing a tag soup strikes me as much harder.
September 30th, 2005 at 5:13pm
What he said. We chose CSV because it’s standard and easy-to-parse; structured HTML seems to have neither advantage.
October 19th, 2005 at 2:16pm
I wanted to make the Gaze lookup service an Asynchronous call (AJAX
without the X), but the problem lies in not typically being allowed to
make async calls to a non-local host. Therefore, I wrapped up the Gaze
service in a little PHP code (gaze-rest.php) that has the same API as
the actual gaze service, but acts like a ‘local service’.
http://highearthorbit.com/projects/geocode/geocode.html
The page then makes a Javascript call, which gets the values from the
form and makes the async call. The returned value is put in the
textarea.
Right now I hardcoded the US and GB, but plan on extending it to
actually dynamically fill the options via a find_places_get_countries
call to Gaze.
The source is available as a link at the bottom of the page.
October 19th, 2005 at 2:24pm
CSV is easy to parse until you get commas within the fields.
There’s nothing in what’s been published that says that this wont happen.
October 19th, 2005 at 3:38pm
CSV is at least a regular language (no recursion), and so it can be parsed with a regular expression. By comparison, HTML needs a special (and very complicated) parser to handle. Commas within the fields aren’t exactly troublesome; the relevant RE is just something like,
/^((^|,)(”([^"]|”")+”|[^,]+))*$/
November 7th, 2005 at 10:00am
I was enjoying this, but it seems to be offline now. Will it be back up?
And, assuming that you’re using data from the GEONet Names Server, may I ask where you found public data for the US?
November 30th, 2005 at 9:56pm
The US government publish quite a lot of GIS data.
Try the geonames stuff (http://geonames.usgs.gov/)
Here’s a direct link to the data by state:
http://geonames.usgs.gov/stategaz/index.html
RJ
December 1st, 2005 at 11:13am
Carsten — sorry about the delay answering this. As RJ says, the data for the US are from USGS; the dump for the whole country is at,
http://geonames.usgs.gov/stategaz/POP_PLACES_DECI.zip
and the program to parse it is here:
usgs-geonames-parse
(you can get that from our public CVS too).
December 2nd, 2005 at 12:07pm
Sorry if I’m being daft here, but what is the format of the query paramenter? If I want to know the place names near a give lat long how do I do it? For example:
http://gaze.mysociety.org/gaze-rest?f=find_places;country=GB;lat=51.53;lon=-0.1020
wants the query paramenter, but I can’t find it documented…
Blaine
December 3rd, 2005 at 11:42am
There isn’t an API which returns the places near a given longitude and latitude. I don’t think our database is indexed to make that easy to do – it is the other way round, to find a latitude and longitude given a place.
December 3rd, 2005 at 2:01pm
As Francis says — I don’t think we have the appropriate geographic indexes. We could probably add them (I’d have to check whether this has any unpleasant resource requirements, but it oughtn’t to) and an API to do a find-places-near-location query; best of all would be if you could offer a patch — the relevant code is here: Gaze.pm, the web interface here: gaze-rest.cgi, and the database schema here: schema.sql. Access to our CVS repository is described here; you want module mysociety/services/Gaze, though it has some dependencies on mysociety/perllib too. There’s appropriate SQL you can copy in pb/db/schema.sql, I think.
I’m afraid that making a local installation is a little bit involved, but drop a mail to chris@mysociety.org or leave questions as comments here if you get stuck. (The latter might be better, since then the results are available to others too.) You could also join the (fairly low-traffic) mysociety-devchat mailing list if you like.
February 26th, 2006 at 8:27am
Chris,
This is exactly what a great web service – useful and easy to use.
Nice work!
April 14th, 2006 at 10:53am
The link to the CVS internet draft doesn’t seem to work any more. Maybe:
http://www.rfc-editor.org/rfc/rfc4180.txt
May 9th, 2006 at 7:40am
How to get geocode using regular expression using php
send me
December 1st, 2006 at 9:16pm
You might be interested in the “geo” microforamt:
http://microformats.org/wiki/geo
September 7th, 2008 at 11:34am
Без преувеличения можно точно сказать, что пост тему раскрыл на все 100 процентов. :)