Every day, thousands of planning applications are submitted to local councils around the country by people applying to demolish a garage, erect a fence or convert a loft. More often than not these applications disappear into proprietary systems that, despite being publicly available, make it hard for members of the public to find out what’s going on in their area.
Last week, we kicked off the first sprint of an exciting new piece of work with the Hampshire Hub Partnership to build a prototype, open source web application to help members of the public find out more about planning applications in their area.
We jumped at the chance to work on this for a number of reasons.
Serving the needs of the public
Firstly, it has the needs of the general public as its focus. The planning process can be baffling if you’re new to it and this tool aims to help make it easier to understand. We’ll be helping people answer some of the most common questions they have about planning applications: What applications are happening near me? What decisions have been made in the past on applications like mine? How likely is it that my application will be dealt with on time?
A wireframe illustrating the potential functionality of the search results page
The site will help people browse planning application data by location — whether a postcode or a street address — and by type — whether it’s an extension, a loft conversion, or a major development like a retail park or commercial warehouse.
Built on Open Data
Secondly, it’s being made possible by the release of open data from local councils, once Ordnance Survey has granted the necessary exemption for locations derived from their data. Many of our projects rely on organisations publishing open data, so it’s great to have the chance to help demonstrate the value of releasing this kind of data openly.
The Hampshire Hub team has already spent a lot of time working with the LGA, DCLG and LeGSB to define a schema for how planning application data should be published. They’ve collaborated with local authorities, in particular Rushmoor Borough Council, to gather planning application data. And they’ve worked with Swirrl to set up an open data platform to collect all of this together, publish it openly and give us and others access to it.
Reuse, don’t rebuild
And finally, rather than build something from scratch, we’ll be using the fabulous PlanningAlerts.org.au open source codebase as a starting point. Planning Alerts is a piece of software built in Ruby on Rails by our friends down under at Open Australia. It gives us a lot of the functionality that we need for free. We plan in time to repay them for their kindness by submitting the features we develop back into their codebase (if they want them, of course).
We’ll also be using a customised version of our administrative boundaries service http://mapit.mysociety.org to store and query the geographical boundaries of different planning authorities in Hampshire (including National Park boundaries from Natural England as well as local council boundaries.)
We’ve just started our second sprint of work atop the Open Australia codebase, building the search functionality we need to help people find applications by location and category. We’re looking forward to seeing the tool grow, get into the hands of users and fill up with data.
FixMyTransport is the most challenging project mySociety has ever tried to build. It’s so ambitious that we’re taking the unusual move of breaking off part of the problem and stress-testing it in the form of the new mini-site Brief Encounters, which has gone live today. It was built by Louise Crow, or Crowbot, as we know her, with design support from Dave Whiteland.
Brief Encounters is not, as the name might suggest, mySociety’s long awaited attempt at a dating site. Instead it’s a place where people can share whimsical stories about unusual things that happened them them, or other people, on public transport. We hope you’ll have a go, read some examples and then contribute your own.
You might be thinking that a whimsical story site doesn’t sound very mySocietyish – and you’d be right. Brief Encounters is actually a technology test-bed to help us crack a new design and data problem: how do you make it as easy as possible for users to pinpoint a specific bus stop, or train route, or a ferry port, as easily as possible? There are over 300,000 such beasties, and nobody has ever really tried to build an interface that makes it easy to find each one quickly and reliably.
So, what we want from you, dear readers, is three fold. We want:
- Stories – the more hilarious or sob-inducing the better
- Feedback on the user experience – how can we make finding a route or node easier?
- Feedback on any data problems you find, ie “My bus stop is missing” – we’re going to have to patch our data with your help, there’s just no other way
For those of you tech minded, the project is built in Ruby and uses the NaPTAN dataset of stations, bus stops and ferry terminals, the National Public Transport Gazetteer database of towns and settlements in the UK, and the National Public Transport Data Repository of sample public transport journeys, from 2008. The first two datasets are free of charge, and the third one mySociety pays for.
Lastly, kudos must go to the hyper-imaginative Nicky Getgood who suggested we collect stories on FixMyTransport, as well as problem reports.
There’s round about 8Gb of unfettered Government data in the core database, plus a whole bunch more for indexing and caching. For comparison, TheyWorkForYou (which now goes back to 1935) has 12Gb. And it’s catching up on traffic also – WhatDoTheyKnow has about half the number of visitors as TheyWorkForYou.
Unfortunately, this new found traffic has led to performance problems. You might have seen errors when using WhatDoTheyKnow in the last week or two. This post is firstly an apology for that. Thank you for your patience. Hopefully it is fixed now – do let us know if you get problems still. And secondly it is some techy stuff about debugging such problems in Ruby on Rails…
When WhatDoTheyKnow started failing, we did the obvious things to start with – moving the database to a separate server, and moving some other services off the same server, to give WDTK more room to breathe. It still kept breaking.
None of my server monitoring tools shed any very clear light as to the problem. I upgraded to the latest version of Passenger, the best Rails deployment tool I’ve seen yet. It’s pretty good, but still not mature enough for my liking. I was still getting the same problems with it, but reporting tools like passenger-memory-stats were really helpful.
Eventually I worked out that it was to do with memory use of the Rails processes. Individual ones would leap up to 1Gb, and never drop back down. If several did, the server (with 4Gb of RAM) would start swapping and grind to a halt. The world of Ruby and Rails memory monitoring software is patchwork at best, and in the end I found the simplest tools the most useful. Here’s some:
- I found some Rails processes were getting jammed, and not dieing even when I restarted Apache. I think in the end this was due to the Passenger spawning method, and our use of the Xapian Ruby module. Running Passenger in RailsSpawnMethod conservative mode made things much more robust.
- Monit, which in a previous life had a job holding up vital structural pillars of buildings with duct tape, makes you feel dirty. Actually it is really useful. Given I couldn’t quickly fix the problem, Monit let me at least reduce the suffering for people trying to use the site meanwhile. Here’s the rule I used, which gives Apache a kick every time server memory use is too high. It was firing every 5 or 10 minutes…
check system localhost if memory > 3500 MB then exec "/usr/sbin/apache2ctl graceful"
- I found memory_profiler on a blog. It helps you find the kind of memory leak where you unintentionally continue to reference an object you don’t use any more. With a specialist subject of string objects. This led to a fix to do with declaring static arrays in classes vs. modules, which I still don’t really understand. But it wasn’t the cause of the big 1Gb memory munching, there were no large enough leaks of this sort.
- The record_memory function in WDTK’s application controller came from another blog. It’s handy as it shows you how much of the system memory in the Ruby process each request causes an increase by. With caveats, this was the best way for me to identify the most damaging requests (search results, and certain public body pages). And it also brought focus on the actual problem – the peak memory use during a request. That’s really important, because Ruby’s memory manager never returns memory to the operating system… The Gb leaps in memory use were because of temporary memory used during certain requests, which the Ruby memory manager then never frees later.
- I made a bunch of functions culminating in allocated_string_size_around_gc. This was really useful in use with the “just add lots of print statements and fiddle” school of debugging. Not everyone’s favourite school, but if your test code can’t catch it, one I often end up using (it gets really involved rarely enough that it doesn’t seem worth setting up an interactive debugger). It led me to various peak memory savings, such as calling “text.gsub!” rather than “text = text.gsub” while removing (email addresses and private information) from FOI request responses, which help quite a bit when dealing with multi-megabyte attachments.
- Finally, I used the overlooked debugging tool, and the one you should never rely on, being common sense. That is, common sense informed by days of careful use of all the other tools. In order to quickly show text extracts when searching, WDTK stores the extracted attachment text in the database. A few of these attachments are quite large, and led to 50Mb fields, often several of which were being loaded and processed in one page request. That this would cause a high peak of memory use all became just obvious to me some time yesterday. I checked that that was the case, and this morning, I changed it to use the full text for indexing, but to at most keep 1Mb for use in snippets. So sometimes now you won’t get a good search extract for queries, but it is rare, and it will at least still return the right result.
I’ve more work to do, I think there are quite a few other quick wins, all of which are making the site faster too. I’m quite happy that WhatDoTheyKnow also has a bunch more test code as a result of all this.
On the other hand, what a disappointing disaster for open source languages beginning with P/R (as opposed to J). Yes, the help and tools were just about there to work it out, but would seem primitive if you’d used say Java’s Memory Analyzer. Indeed somebody over on StackOverflow suggested running your site in JRuby and using exactly that tool…
One of the special pieces of magic in TheyWorkForYou is its email alerts, sending you mail whenever an MP says a word you care about in Parliament. Lots of sites these days have RSS, and lots have search, but surprisingly few offer search based email alerts. My Mum trades shares on the Internet, setting it to automatically buy and sell at threshold values. But she doesn’t have an RSS reader. So, it’s important to have email alerts.
So naturally, when we made WhatDoTheyKnow, search and search based email alerts were pretty high up the list, to help people find new, interesting Freedom of Information requests. To implement this, I started out using acts_as_solr, which is a Ruby on Rails plugin for Solr, which is a REST based layer on top of the search engine Lucene.
I found acts_as_solr all just that bit too complicated. Particularly, when a feature (such as spelling correction) was missing, there were too many layers and too much XML for me to work out how to fix it. And I had lots of nasty code to make indexing offline – something I needed, as I want to safely store emails when they arrive, but then do the risky indexing of PDFs and Word documents later.
The last straw was when I found that acts_as_solr didn’t have collapsing (analogous to GROUP BY in SQL). So I decided to bite the bullet and implement my own acts_as_xapian. Luckily there were already Xapian Ruby bindings, and also the fabulous Xapian email list to help me out, and it only took a day or two to write it and deploy it on the live site.
If you’re using Rails and need full text search, I recommend you have a look at acts_as_xapian. It’s easy to use, and has a diverse set of features. You can watch a video of me talking about WhatDoTheyKnow and acts_as_xapian at the London Ruby User Group, last Monday.
On our servers we only install software from Debian packages, or our own software with install scripts from our own CVS. This at first seems a bit mad, especially to Ruby on Rails people who love their gems. But it’s a sane way of managing lots of servers (we’ve got 7 Debian servers, and 2 FreeBSD servers to run at the moment).
Of course, you could install packages on them from CPAN, from Ruby Gems, by compiling them yourself and putting them in /usr/local. But you’d have to have another system for each packages system to keep track of what you’d installed and what version, and to worry about security updates. And you’d lose some of the benefits of dependency checking.
Most of our servers are, inevitably, still running Debian Sarge (the latest and greatest when we started them a few years ago). We’re going to gradually upgrade them to Debian Etch, but it is going to take a while. In the fast moving world of Rails this isn’t particularly helpful, so you have to backport packages. I couldn’t find any, so have made some myself.
You can find packages for Rails 1.2.5-1 on Sarge in our Debian package repository. Yeah, still an old version for you people “living on the edge”, but it’s the one in Etch (the latest Debian stable), and is way better than 0.13.1-1 that we had before 🙂
This week has been quite bitty. I’ve been doing more work on the Freedom of Information site, have been getting into the swing of Ruby on Rails. Once you’ve learnt its conventions, it is quite (but not super) nice.
As far as languages are concerned, Ruby seems identical in all interesting respects to Python. It’s like learning Spanish and Italian. Both are super languages. Ruby has nice conventions like exclamation marks at the end of function names to indicate they alter the object, rather than return the value (e.g. .reverse!). But then Python has a cleaner syntax for function parameters. It is swings and roundabouts.
Rails has lots of ways of doing things which we already have our own ways of doing for other sites. The advantage of relearning them, is that other people know them too. So Louise was able to easily download and run the FOI site, and make some patches to it. Which would have been much harder if it was done like our other sites. Making development easier is vital – for a long time I’ve wanted a web-based cleverly forking web application development wiki. But while I dream about that, Rails packaging everything you need to run the app in a standard way in one directory that quite a few people know how to use, helps.
Other things… I’ve been helping Richard set up GroupsNearYou on our live servers, it should be ready for you to play with soon. It looks super nice, and is easy to use. I’ve had some work to do with recruitment. And catching up on general customer support email for TheyWorkForYou and PledgeBank. I’ve also been updating the systems administration documentation on our internal wiki, so others can work out how to run our servers.
The meeting day voting application (vote often!) that we’ve been mentioning everywhere all week is a new departure for mySociety. In a frantic bid to catch up with the cool kids, it’s our first deployed Ruby on Rails application. This happened because Louise Crow, who kindly volunteered to make it (thanks Louise!), felt like learning Rails. We used to have a policy of using any language, as long as it was open source and began with the letter P (Python/Perl/PHP…). This has now been extended to the letter R!
You can browse the source code in our CVS repository. One interesting thing about Rails applications is that they are structured things, a deployable directory tree. So are mySociety applications.
For example, take a look at PledgeBank’s directory. It’s a mini, well defined filesystem – the ‘web’ directory is the meat of the stuff, but note also ‘web-admin’ for the administrator tools. Include files are tucked away in ‘perllib’ and ‘phplib’, while script files nestle under ‘bin’. We keep configuration files (analogous to the Windows Registry, or /etc on Unix) under ‘conf’. Database schema files live in ‘db’.
And a rails application is much the same. But much much much more detailed. Some of those are extra directories which we also have, but only when we deploy, not in CVS (for example, log files). All in all they are surprisingly similar structures, which shows we’re either both on the right lines, or both on the wrong false trail.
Like making Frankenstein’s monster, poor Louise and I had to graft these two beasts together just to deploy this small application. For example, we have a standard configuration file format which we read from Perl, Python and PHP. The deploy system does useful things with it like check all entries are present, and generate the file for any sandbox from a template. To get round this, there’s an evil script, possibly the first time PHP has been used to make YAML. (And please don’t look at the thing that makes symlinks.)
We could have extended Rails to be able to read its configuration from our file format, but that would be a lot more work. And we could have discovered how to hack its log file system to write to the mySociety log file directory. But everything is so coupled, it doesn’t ever seem worth it. Any Rails apps we deploy will just have to be an even more confusing mass of directories, application trees inside application trees.