FixMyStreet has been around for nearly nine years, letting people report things and optionally include a photo; the upshot of which is we currently have a 143GB collection of photographs of potholes, graffiti, dog poo, and much more.
For almost all that time, attaching a photo has been through HTML’s standard file input form; it works, but that’s about all you can say for it – it’s quite ugly and unfriendly.
We have always wanted to improve this situation – we have a ticket in our ticketing system, Display thumbnail of photo before submitting it, that says it dates from 2012, and it was probably in our previous system even before that – but it never quite made it above other priorities, or when it was looked at, browser support just made it too tricky to consider.
Here’s a short animation of FixMyStreet’s new photo upload, which also allows you to upload multiple photos:
For the user, the only difference from the current interface is that the photo field has been moved higher up the form, so that photos can be uploading while you are filling out the rest of the form.
Personally, I think this benefit is the largest one, above the ability to add multiple photos at once, or the preview function. Some of our users are on slow connections – looking at the logs I see some uploads taking nearly a minute – so being able to put that process into the background hopefully speeds up the submission and makes the whole thing much nicer to use.
When creating a new report, it can sometimes happen that you fill in the form, include a photo, and submit, only for the server to reject your report for some reason not caught client-side. When that happens, the form needs to be shown again, with everything the user has already entered prefilled.
There are various reasons why this might happen; perhaps your browser doesn’t support the HTML5 required attribute (thanks Safari, though actually we do work around that); perhaps you’ve provided an incorrect password.
However, browsers don’t remember file inputs, and as we’ve seen, photo upload can take some time. From FixMyStreet’s beginnings, we recognised that re-uploading is a pain, so we’ve always had a mechanism whereby an uploaded photo would be stored server side, even if the form had errors, and only an ID for the photo was passed back to the browser so that the user could hopefully resubmit much more quickly.
This also helped with reports coming in from external sources like mobile phone apps or Flickr, which might come with a photo already attached but still need other information, such as location.
Of course there were edge cases and things to tidy up along the way, but if the form hadn’t taken into account the user experience of error edge cases from the start, or worse, had assumed all client checks were enough, then nine years down the line my job would have been a lot harder.
Anyway, long story short, adding photos to your FixMyStreet reports is now a smoother process, and you should try it out.
A few of mySociety’s developers are at DjangoCon Europe in Cardiff this week – do say hello As a contribution to the conference, what follows is a technical look (with bunny GIFs) into an issue we had recently with serving large amounts of data in one of our Django-based projects, MapIt, how it was dealt with, and some ideas and suggestions for using streaming HTTP responses in your own projects.
MapIt is a Django application and project for mapping geographical points or postcodes to administrative areas, that can be used standalone or within a Django project. Our UK installation powers many of our own and others’ projects; Global MapIt is an installation of the software that uses all the administrative and political boundaries from OpenStreetMap.
A few months ago, one of our servers fell over, due to running entirely out of memory.
Looking into what had caused this, it was a request for
/areas/O08, information on every “level 8” boundary in Global MapIt. This turned out to be just under 200,000 rows from one table of the database, along with associated data in other tables. Most uses of Global MapIt are for point lookups, returning only the few areas covering a particular latitude and longitude; it was rare for someone to ask for all the areas, but previously MapIt must have managed to respond within the server’s resources (indeed, the HTML version of that page had been requested okay earlier that day, though had taken a long time to generate).
resourcemodule, I manually ran through the steps of this particular view, running
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024after each step to see how much memory was being used. Starting off with only 50Mb, it ended up using 1875Mb (500Mb fetching and creating a lookup of associated identifiers for each area, 675Mb attaching those identifiers to their areas (this runs the query that fetches all the areas), 400Mb creating a dictionary of the areas for output, and 250Mb dumping the dictionary as JSON).
The associated identifiers were added in Python code because doing the join in the database (with e.g.
select_related) was far too slow, but I clearly needed a way to make this request using less memory. There’s no reason why this request should not be able to work, but it shouldn’t be loading everything into memory, only to then output it all to the client asking for it. We want to stream the data from the database to the client as JSON as it arrives; we want in some way to use Django’s StreamingHTTPResponse.
The first straightforward step was to sort the areas list in the database, not in code, as doing it in code meant all the results needed to be loaded into memory first. I then tweaked our JSONP middleware so that it could cope when given a StreamingHTTPResponse as well as an HTTPResponse. The next step was to use the json module’s
iterencodefunction to have it output a generator of the JSON data, rather than one giant dump of the encoded data. We’re still supporting Django 1.4 until it end-of-lifes, so I included workarounds in this for the possibility of StreamingHTTPResponse not being available (though then if you’re running an installation with lots of areas, you may be in trouble!).
But having a StreamingHTTPResponse is not enough if something in the process consumes the generator, and as we’re outputting a dictionary, when I pass that dictionary to the json’s
iterencode, it will suck everything into memory upon creation, only then iterating for the output – not much use! I need a way to have it be able to iterate over a dictionary…
The solution was to invent the iterdict, which is a subclass of dict that isn’t actually a dict, but only puts an iterable (of key/value tuples) on items and iteritems. This tricks python’s JSON module into being able to iterate over such a “dictionary”, producing dictionary output but not requiring the dict to be created in memory; just what we want.
I then made sure that the whole request workflow was lazy and evaluated nothing until it would reach the end of the chain and be streamed to the client. I also stored the associated identifiers on the area directly in another iterator, not via an intermediary of (in the end) unneeded objects that just take up more memory.
I could now look at the new memory usage. Starting at 50Mb again, it added 140Mb attaching the associated codes to the areas, and actually streaming the output took about 25Mb. That was it Whilst it took a while to start returning data, it also let the data stream to the client when the database was ready, rather than wait for all the data to be returned to Django first.
But I was not done. Doing the above then revealed a couple of bugs in Django itself. We have GZip middleware switched on, and it turned out that if your StreamingHTTPResponse contained any Unicode data, it would not work with any middleware that set Content-Encoding, such as GZip. I submitted a bug report and patch to Django, and my fix was incorporated into Django 1.8. A workaround in earlier Django versions is to run your iterator through
map(smart_bytes, content)before it is output (that’s six’s iterator version of map, for Python 2/3 compatibility).
Now GZip responses were working, I saw that the size of these responses was actually larger than not having the GZip middleware switched on?! I tracked this down to the constant flushing the middleware was doing, again submitted a bug report and patch to Django, which also made it into 1.8. The earlier version workaround is to have a patched local copy of the middleware.
Lastly, in all the above, I’ve ignored the HTML version of our JSON output. This contains just as many rows, is just as big an output, and could just as easily cripple our server. But sadly, Django templates do not act as generators, they read in all the data for output. So what MapIt does here is a bit of a hack – it has in its main template a “!!!DATA!!!” placeholder, and creates an iterator out of the template before/after that placeholder, and one compiled template for each row of the results.
Now Django 1.8 is out, the alternate Jinja2 templating system supports a
generate()function to render a template iteratively, which would be a cleaner way of dealing with the issue (though the templates would need to be translated to Jinja2, of course, and it would be more awkward to support less than 1.8). Alternatively, creating a generator version of Django’s Template.render() is Django ticket #13910, and it might be interesting to work on that at the Django sprint later this week.
Using a StreamingHTTPResponse is an easy way to output large amounts of data with Django, without taking up lots of memory, though I found it does involve a slightly different style of programming thinking. Make sure you have plenty of tests, as ever Streaming JSON was mostly straightforward, though needed some creative encouragement when wanting to output a dictionary; if you’re after HTML streaming and are using Django 1.8, you may want to investigate Jinja2 templates now that they’re directly supported.
[ I apologise in the above for every mistaken use of generator instead of iterator, or vice-versa; at least the code runs okay ]
In this post, we want to explain a bit more about why we spent time and effort on making them, when normally we advocate for mobile websites.
Plus, for our technical audience, we’ll explain some of the tools we used in the build.
The why bit
When we redesigned FixMyStreet last year, one of the goals was to provide a first class experience for people using the website on their mobile phone browsers. We’re pretty happy with the result.
We’re also believers in only building an app if it offers you something a mobile website can’t. There are lots of reasons for that: for a start, apps have to be installed, adding a hurdle at which people may just give up. They’re time-consuming to build, and you probably can’t cater for every type of phone out there.
Given that, why did we spend time on making a mobile app? Well, firstly, potholes aren’t always in places with a good mobile signal, so we wanted to provide a way to start reporting problems offline and then complete them later.
Secondly, when you’re on the move you often get interrupted, so you might well start reporting a problem and then become distracted. When that happens, reports in browsers may get lost, so we wanted an app that would save it as you went along, and allow you to come back to it later.
And the how bit (with added technical details, for the interested)
Having decided to build an app the next question is how to build it. The main decision is whether to use the native tools for each phone operating system, or to use one of the various toolkits that allow you to re-use code across multiple operating systems.
We quickly decided on using Phonegap, a cross platform toolkit, for several reasons: we’d already used Phonegap successfully in the past to build an app for Channel 4’s Great British Property Scandal (that won an award, so clearly something went right) and for Züri Wie Neu in Zurich, so it was an easy decision to use it again.
We decided to focus on apps for Android and iOS, as these are the two most popular operating systems. Even with this limitation, there is a lot of variety in the size and capability of devices that could be running the app – think iPads and tablets – but we decided to focus primarily on providing a good experience for people using phone-sized devices. This decision was partly informed by the resources we have at hand, but the main decider was that we mostly expect the app to be used on phones.
There was one big challenge: the functionality that allows you to take photos in-app. We just couldn’t get it to work with older versions of Android – and it’s still not really adequate. We just hope most people are updating their operating systems! Later versions of Android (and iOS) were considerably less frustrating, and perhaps an earlier decision to focus on these first would have led to a shorter development process.
On balance though? We’d still advocate a mobile-optimised browser site almost every time. But sometimes circumstances dictate – like they did for FixMyStreet – that you really need an app.
We’d give you the same advice, too, if you asked us. And we’d happily build you an app, or a mobile-friendly site, whichever was more suitable.
What are your plans for late April? If you’re a civic coder, a campaigner or activist from anywhere in the world, hold everything: we want to see you in Santiago, Chile, for the first international PoplusCon.
Poplus is a project which aims to bring together those working in the digital democracy arena – groups or individuals – so that we can share our code and thus operate more efficiently.
We’re right at the beginning of what we hope will grow into a worldwide initiative. If you’d like to get involved, now is the time.
Together with Poplus’ co-founders, Ciudadano Inteligente, we will be running a two-day conference in Santiago on the 29th and 30th of April. It is free to attend, and we can even provide travel grants for those who qualify.
Much of what we do here at mySociety relies on Open Data, so naturally we support Open Data Day. In case you haven’t come across this event before, here’s the low-down:
Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption open data policies by the world’s local, regional and national governments.
If you’re planning on being a part of Open Data Day, you may find some of mySociety’s feeds, tools and APIs useful. This post attempts to put them all in one place. (more…)
This month we released a new version of FixMyStreet. Amongst the new features, fixes, and thingamajigs were some small improvements added by two volunteers, Andrew and Andy.
Even though these are not core pieces of functionality — in fact, precisely because they are not — we want to draw your attention to why they were included, and why this is a Very Good Thing. And perhaps, if you’re a coder who wants to put something into an Open Source project but hasn’t quite found a way in, Andrew and Andy’s work will nudge you into becoming a contributor too.
One of the axioms of Open Source is that, because anyone can read the source code, in theory anyone can contribute to it. In practice, though, it’s not really as simple as that. Both ends of the “anyone can contribute” idea require effort:
- Before contributing to a project of any complexity (as we hope you want to do), there’s often a lot to learn, or figure out, before any work can even begin;
- Before accepting contributions to such a project (keen as we are to do so), there’s an overhead of testing, checking, and managing the incoming code.
The ugly real world
The basic issue here is that software is complex — no matter how well-written, tested and documented program code is, if the problem it’s solving is in the real world, it’s not going to be simple.
This is especially true of anything used by the public, because often you can only make things seem simple at the front (such as a clean web interface or “user journey”) by working hard behind the scenes with data structures and processes that handle the underlying complexity. It’s inevitably true of any projects which have been developed over time — programmers like to use the term “legacy code” to describe anything that wasn’t written then way they’d choose to write it now.
Often the problems that software is solving are not quite as obvious as they first appear. At mySociety we’ve got a wealth of experience and actual usage data that ultimately changes the way we build, and develop, our platforms. We understand the fields we work in well (technically, the nerds like to call these the “problem domain”), whether it’s governmental practice or civic user behaviour, and that’s often knowledge that’s not encapsulated anywhere in the program code.
Furthermore, any established platform must protect against the risk that new changes break old behaviour — something that regression testing is designed to catch. This is especially important on platforms like FixMyStreet or Alaveteli where the software is already running in multiple installations.
This is why we have a team of full-time, experienced, and (thanks to our rigorous recruitment process) talented programmers who can invest the time and effort to be familiar with all these things when they set to work coding.
But this builds up to an impediment: sensitivity to any of these issues is enough to make anyone think twice about simply forking our code and starting to hack on it for us.
How it sometimes works
In practice, then, how does anything get contributed? How come it doesn’t all get written by our own coders?
The answer is, of course, we do work with major contributors outside our own team (have a look at the activity on our github repos to see them) — but it always requires a period of support and on-line discussion both before and during the process. There’s also the business of testing, and sometimes politely pushing back on, pull requests (which is how code contributions are submitted). But the fact of the matter is that this is only possible for people who are willing to spend time familiarising themselves with the specific code, technologies, and practices that we’re using on that project. These tend to be hard-code devs, and — here’s the point — they’re always experienced Open-Sourcers: this will never be the first time they’ve worked on such a project.
Which is where the little features come in.
The joy of small
We noticed this problem — that contributing code to our projects is simply not easy for us or for contributors. Importantly, it’s not just us: it’s Open Source everywhere. But we can’t simply dismiss the opportunity for contribution. We want to encourage coders to do this, because we believe that Open Source is intrinsically a good thing.
We do two things to make it easier to contribute:
- We identify small features that a coder can approach without a full understanding of the code and the problem domain;
- We help people (like you!) get started by opening up a laptop at our weekly meetups.
The first of these seems obvious now: when we add issues (an idea for a new feature, or maybe a bugfix) to our github repos, if we think they’re candidates for manageable, isolated work, we tag them with the label: Suitable for volunteers (like this).
Often these turn out to be “nice-to-haves” that one of our full-time devs can’t be pulled off more pressing problems to add just now. (Case in point: Andrew added a date-picker to the FixMyStreet admin stats page, and three of our own staff had stumbled upon and applauded the difference it had made within a week of it going live).
It means it’s much easier for you to get involved, because often it’s a little, isolated piece of code. And it’s much more manageable for us, because the change you’ll be submitting is also isolated.
So if you’re looking for something to tackle, pick one of those issues, and let us know (just to check nobody else has baggsied it already). Fork the repo, cut the code, write the tests, submit a pull request!
But wait — if that last paragraph made you gulp, here’s the second thing we do: meetups. Of course, this is less helpful if you can’t make it to London on Wednesdays, but the concept is sound. Put simply, if there is a barrier to entry to diving in, and if one-on-one time with a dev, and some pizza, is what it takes to overcome that, it’s time well spent for you to come and see us.
Not 100% confident with git? Not sure when
db/schema.sqlgets used or how we like to handle migrations? No problem: we’re happy to guide you.
If this has struck a chord with you — you’d love to be an Open Source contributor one day, and you think mySociety projects make the world a better place — perhaps you should take a poke in our repos, or come along to a meetup. Start small, but do start.
Oh, and Andrew and Andy — thanks guys 😉
Photo by Matt Katzenberger (CC)
As you may know, TheyWorkForYou hasn’t displayed proceedings from the Scottish Parliament for a couple of years – but we’re glad to say that we’ve now fixed that. You can read debates from the main chamber from the Official Report and sign up to alerts from the Scottish Parliament here – just as you can for the UK Parliament and the Northern Ireland Assembly.
For those who are interested in the ‘whys’, in January of 2011, the Scottish Parliament changed the way that they published the Official Report on their website. This change broke our scraper and parser – that is, the pieces of software that fetch content and turn it into structured data.
mySociety is a small organisation with many priorities, and, because it wasn’t a simple fix, we weren’t able to allocate resources to it. So massive thanks are due to our developer Mark, who made the necessary changes to our code in his own free time.
You can help
There’s still more work to be done to get TheyWorkForYou’s data for Scotland to be as complete as it was before they changed their website, such as restoring written answers. If you think you have the expertise to help with that or any of the other issues for TheyWorkForYou Scotland, then we’d love to hear from you. And there’s still the Welsh Assembly to work on too!
Photo by Shelley Bernstein (CC)
Simple things are the most easily overlooked. Two examples: a magician taking a wand out of his pocket (see? so simple that maybe you’ve never thought about why it wasn’t on the table at the start), or the home page on www.fixmystreet.com.
The first thing FixMyStreet asks for is a location. That’s so simple most people don’t think about it; but it doesn’t need to be that way. In fact, a lot of services like this would begin with a login form (“who are you?”) or a problem form (“what’s the problem you want to report?”). Well, we do it this way because we’ve learned from years of experience, experiment and, yes, mistakes.
We start off by giving you, the user, an easy problem (“where are you?”) that doesn’t offer any barrier to entry. Obviously, we’re very generous as to how you can describe that location (although that’s a different topic for another blog post). The point is we’re not asking for accuracy, since as soon as we have the location we will show you a map, on which you can almost literally pinpoint the position of your problem (for example, a pothole). Pretty much everyone can get through that first stage — and this is important if we want people to use the service.
How important? Well, we know that when building a site like FixMyStreet, it’s easy to forget that nobody in the world really needs to report a pothole. They want to, certainly, but they don’t need to. If we make it hard for them, if we make it annoying, or difficult, or intrusive, then they’ll simply give up. Not only does that pothole not get reported, but those users probably won’t bother to try to use FixMyStreet ever again.
So, before you know it, by keeping it simple at the start, we’ve got your journey under way — you’re “in”, the site’s already helping you. It’s showing you a map (a pretty map, actually) of where your problem is. Of course we’ve made it as easy as possible for you to use that map. You see other problems, already reported so maybe you’ll notice that your pothole is already there and we won’t have wasted any of your time making you tell us about it. Meanwhile, behind the scenes, we now know which jurisdictions are responsible for the specific area, so the drop-down menu of categories you’re about to be invited to pick from will already be relevant for the council departments (for example) that your report will be going to.
And note that we still haven’t asked you who you are. We do need to know — we send your name and contact details to the council as part of your report — but you didn’t come to FixMyStreet to tell us who you are, you came first and foremost to report the problem. So we focus on the reporting, and when that is all done then, finally, we can do the identity checks.
Of course there’s a lot more to it than this, and it’s not just civic sites like ours that use such techniques (most modern e-commerce sites have realised the value of making it very easy to take your order before any other processing; many governmental websites have not). But we wanted to show you that if you want to build sites that people use, you should be as clever as a magician, and the secret to that is often keeping it simple — deceptively simple — on the outside.
Thanks to everyone who came to the inaugural mySociety Hack Night – and thanks too to our hosts, the Open Data Institute for such a great space to work in.
Topics ranged from community-building in post-conflict societies, to mountain rescue in Wales, via an extended front-end for WriteToThem which would put campaigns in context. It really showed what a lot of exciting ideas there are, just waiting for someone to launch into them.
We’ll be running these nights every Wednesday: we’re currently booking for the following dates, 6:00 -9:00 pm.
- 24th July – Open Data Institute
- 31st July – Open Data Institute
- 7th August – Open Data Institute
- 14th August – Mozilla London
- 21st August – Mozilla London
- 28th August – Mozilla London
- 4th Setember – Mozilla London
Places are restricted, so drop us a line on firstname.lastname@example.org if you’d like to be sure of getting in. All you need is a little coding experience and a laptop.
We’d also like to start a conversation in the comments below, so that like-minded folk can think about hacking together. If you’re looking for people to help you with an idea, or if you see something you like the look of, leave a note below and try to synchronise which nights you’ll be attending.
Photo by Being Focal (CC)
Today, we are using the phrase “Alaveteli upgrade” rather a lot – and not just because it’s such a great tongue-twister. It’s also a notable milestone for our open-source community.
Alaveteli is the software that underlies WhatDoTheyKnow, our Freedom of Information website. The code can also be deployed by people in other countries who wish to set up a similar site. If you’re a ‘front-end user’, someone who just uses WhatDoTheyKnow to file or read FOI requests, this upgrade will go unnoticed… assuming all goes well at our end, that is. But if you’re a developer who’d like to use the platform in your own country, it makes several things easier for you.
Alaveteli will now be using the Rails 3 series – the series we were previously relying on, 2, has become obsolete. One benefit is that we’re fully supported by the core Rails team for security patches. But, more significant to our aim of sharing our software with organisations around the world, it makes Alaveteli easier to use and easier to contribute to. It’s more straightforward to install, dependencies are up-to-date, code is clearer, and there’s good test coverage – all things that will really help developers get their sites up and running without a problem.
Rails cognoscenti will be aware that series 4.0 is imminent – and that we’ve only upgraded to 3.1 when 3.2 is available. We will be upgrading further in due course – it seemed sensible to progress in smaller steps. But meanwhile, we’re happy with this upgrade! The bulk of the work was done by Henare Degan and Matthew Landauer of the Open Australia Foundation, as volunteers – and we are immensely grateful to them. Thanks, guys.
Image credit: Sashi Manek (cc)