When a bit of government forwards or attaches emails using Outlook, they get sent using a special, strange Microsoft email format. Up until now, WhatDoTheyKnow couldn’t decode it. You’d just see a weird attachment on the response to your Freedom of Information request, and probably not be able to do anything with it.
He then told us about it, and I merged his changes into the main WhatDoTheyKnow code, tested them out on my laptop, then made them live. It all work perfectly first time. Peter even added the new dependency on vpim to WhatDoTheyKnow conf/packages.
Now if you go to an Outlook attachment on WhatDoTheyKnow,such as this one you’ll just see the files, and be able to download them, and view them as HTML as normal. They’ll also get indexed by the search (although I need to do a rebuild for that for it to work with old requests).
If you want to have a go making an improvement to a mySociety site, you can get the code for most of them from our github repositories. For some sites, there’s an INSTALL.txt file explaining how to get a development environment set up. Let us know if you do anything – even incremental improvements to installation instructions are really useful. And new, useful, features like Peter’s are even more so.
Members of the team running mySociety’s freedom of information website WhatDoTheyKnow.com also campaign for improvements to freedom of information law.
Volunteer John Cross has been drawing his MP’s attention to a major loophole in the UK’s Freedom of Information Act which means that a company wholly owned by one local authority is subject to the act but a company owned by two local authorities is not. John’s MP, Peter Bottomley, has been convinced that this anomaly in the law does not make sense and has submitted an Early Day Motion calling for the loophole to be closed. The EDM also highlights the fact that currently a company owned 95% or even 99.5% by a single public authority is not subject to the provisions of the act, as only companies owned 100% by a single authority are currently covered.
The text of the motion states:
“That this House notes that section 6 of the Freedom of Information Act 2000 with certain exceptions makes companies wholly owned by the Crown or by a single public authority subject to the Act; further notes that a company wholly owned by two or more public authorities or 95 per cent. owned by a single public authority will be outside the scope of the Freedom of Information Act 2000; and calls for the closure of this loophole and for companies owned 90 per cent. or more by any number of public authorities to be subject to the Freedom of Information Act 2000.”
The motion is currently open for other MPs to sign-up to and show their support. If you would like to help increase the number of public bodies that are covered by Freedom of Information legislation please consider writing to your MP, asking them to add their name to the current signatories.
There are many situations where public authorities working together or even setting up jointly owned companies is commendable. Such arrangements can lead to savings though economies of scale and avoid duplication; we may see more such companies set up as a response to economic pressures. What is problematic though is the loss of accountability which currently occurs when public bodies come together and set up these companies without requiring them to follow the highest standards of openness, transparency and accountability.
Examples of Companies Which Would Be Made Subject to FOI if the Loophole was Closed:
- Connexions Nottinghamshire Limited – provides support services to young people and is jointly owned by Nottingham City Council and Nottinghamshire County Council.
- Coventry and Solihull Waste Disposal Company – owned two thirds by Coventry City Council and one third by Solihull MBC
- G-Mex Limited – Through its ownership of Destination Manchester Ltd and investment in Modesole Ltd, Manchester City Council has a 95% shareholding in G-Mex Limited
- Higher Education Statistics Agency (HESA) – this company is the official agency for the collection, analysis and dissemination of quantitative information about higher education.
- Manchester Airport PLC – the Manchester Airport Group is owned by the ten local authorities of Greater Manchester
- The Russell Group – is owned by, and represents 20 of the UK’s largest universities, the company’s aims as set out in documents filed on incorporation included “to influence and make representations to stakeholders and legislators”
Many housing associations, purchasing consortia, representative bodies and urban development companies are among the organisations which would be required to operate in a more transparent manner should this loophole be closed.
Over the last weekend of November 2009 a group of 21 mySociety staff, volunteers and trustees went to a house outside of Bristol to wrestle with the question of what mySociety should build over the next 12 months. This was the fourth time we’ve done it, and these meetings have become a crucial part of our planning. This year, we were talking not just about what new features to add to our current sites, but also about the possibility of building an entirely new website for the first time in a couple of years. The discussions were lively and passionate because we know we have a lot to live up to: not only is our last major new site (WhatDoTheyKnow) likely to cross the 1 million unique visitors threshold this year, but we understood that there were people and organisations who weren’t there who would be counting on us to set the bar high.
A chunk of the weekend involved vetting the 227 project ideas that were proposed via our Call for Proposals. I’m going to write a separate post on our thoughts about that process, but if you look at the list below you may spot things that were submitted in that call.
One nice innovation that helped us whittle down our ideas from unmanageable to manageable numbers was a pairwise comparison game to help us prioritise ideas, build custom for the occasion by the wonderful and statistically talented Mark Longair. In other words, we used the technique that powers KittenWar.com to help decide our key strategic priorities for the next year: after all , if we don’t, who will?
By the end of the weekend we had not battened everything down – there are too many uncertainties around how much time we will have, and some key ideas that need more speccing. However, we were able to put various things into different buckets, marked according to size and degree of certainty. So here goes:
1. Things which were decided at the last retreat, which we are definitely building, and which (mostly) need doing before next year’s stuff starts getting built
- A top level page for each bill on TheyWorkForYou
- Future business (ie the calendar) for events in the House of Commons, including a full set of alerting options.
- Video clips on MP pages on TheyWorkForYou
- Epicly ambitious election data gathering and quiz building with the lovely volunteers at DemocracyClub
2. Small new things that we are very probably doing because there was lots of consensus
- Publish a standard that councils can use to post problems like potholes in their databases to FixMyStreet and other similiar sites.
- Template requests in WhatDoTheyKnow so that users are strongly encouraged to put in requests that are well structured.
- After the next general election, email new MPs with various bits of info of interest to them including their new login to HearFromYourMP, their page on TheyWorkForYou, explanation of how WriteToThem protects them from spam and abuse, a double check that their contact details are correct, and a introduction to the fact that we record their correspondance responsiveness and voting records.
- Add to WhatDoTheyKnow descriptions about what kind of public authority a specific entity is (ie ‘school’, ‘council’) and the information they are likely to hold if FOIed.
- Show divisions (parliamentary votes) properly on debate pages on TheyWorkForYou, ie show the results of a vote on the same page as the debate where the issue was discussed, with full party breakdowns on each division.
- Add “How to benefit from this site” page on TheyWorkForYou, inspired by OpenCongress.org
- Help Google index TheyWorkForYou faster by creating a sitemap.xml file that is dynamically updated.
- Using the data we expect to have from DemocracyClub’s volunteers, send a press release about every new MP and to all relevent local newspapers
- Incorporate a council GeoRSS problem feed into FMS
3. Slighty more time consuming things we are very probably doing because there was lots of consensus
- 1 day per month developer time that customer support guru Debbie Kerr gets to allocate as she see fit.
- Premium account feature on WhatDoTheyKnow to hide requests so that journalists and bloggers can still get scoops and then share their correspondance later.
- Add Select Committees to TheyWorkForYou, including email alerts on calls for evidence.
- Take professional advice on how to handle PR around the election
4. Much more time consuming things and things around which there is less consensus. NB – We do not currently have the resources to do everything on this list next year – it is an ambitious target list.
- Primary New site: TBA in a new post
- Add a new queue feature to WhatDoTheyKnow so that users can write requests, then table them for comments from other users and expert volunteers before they are sent to the public authority
- Relaunch our Volunteer tasks page on our sites, keep it populated with new tasks, specifically allocate resources to handhold potential volunteers. Allocate time to see if any of the ideas that we didn’t build could be parcelled into volunteer tasks.
- Secondary New site (if we have a lot more time than we expect): Exploit extraordinary richness of Audit Comission local government target data in a TheyWorkForYou-like fashion.
- FixMyStreet to become international with a) maps for most of the world b) easy to follow instructions explaining how to supply mySociety with the required data to us to enable us to turn on FixMyStreet in non UK countries or areas. This data would includ ie gettext powered text translation files, shapefiles of administrative boundaries, and lists of contact data.
- Add votes and proceedings to TheyWorkForYou (where they reveal statutory instrument titles that are not debated but where the law gets changed anyway)
- Carry out usability testing on TheyWorkForYou with then help of volunteer Joe Lanman – then implement changes recommended during a development process taking up to 10 days.
- Add to TheyWorkForYou questions that have been tabled in the house of commons but which haven’t been answered yet.
- Add a new interface for just councils so that they can say if a problem on FixMyStreet has changed status.
Phew. And that’s not even counting the projects we hope to help with in Central and Eastern Europe, our substantial commercial work, or the primary new site idea, which will be blogged in Part 2.
Today we have a strange story about a department that appears to think that it has a duty not to release information under FOI if it makes people angry.
It all starts in January 2009 the Department for Children, Schools and Families (DCSF) appointed an expert by the name of Graham Badman to conduct a review of elective home education in England. It probably goes without saying that this is an issue far from our concerns, and an issue that mySociety has no views on – what makes us interested is the process that followed.
Shortly after the publication of the report, Elaine Walton, a user of mySociety’s freedom of information website WhatDoTheyKnow.com requested copies of communications between the Department for Children, Schools and Families and Nektus Ltd. the company through which it appears Mr Badman was paid for his work.
According to email replies to Ms Walton, the DCSF located two relevant invoices which show how much money was paid, but refused to disclose them. Strangely, though, they were not refused on grounds of commercial confidentiality, but rather on something more unusual. Here are the exemptions they cited:
- Section 40 – Personal Information
- Section 38 – Health and safety
Health and Safety? A little investigation reveals more.
When Ms Walton appealed against this decision, an internal review was carried out within the DSCF. The internal review’s findings stated that Mr Badman was likely become a victim of harassment if certain personal details were made public, hence a health and safety concern, and hence no publication of these invoices. Fair enough – nobody would be in favour of revealing private, sensitive information that would endanger anyone’s life or family, especially in the presence of a known threat. But take a look at this:
“That the Department had initially been drafting a response that included the release of invoices with only personal data redacted. But before the draft was complete it was apparent that there was a campaign of harassment and vilification against Graham Badman and other individuals/organisations that had contributed to the Report. In the light of this, at the weekly review meeting of FOI cases, it was considered that the balance of public interest might have shifted towards withholding.”
What is very curious here is the admission that the department had been thinking of releasing the invoices with personal data hidden (ie no home address, bank details etc). But then because of a campaign of harassment, it was decided that they wouldn’t publish anything at all. So not just no personal information, but no dates, no amounts of money, nothing.
What is so unease-making about this FOI decision is that it appears to be saying that departments may conceal information on how much public money has been spent on something because releasing that information will make some angry people even angrier. Surely this can’t be right – if it were every budget would be conducted in complete secrecy. We would encourage the Information Commissioner’s Office to take a look.
mySociety’s Freedom of Information website WhatDoTheyKnow is designed to appear simple and straightforward to users. That appearance belies the fact that behind the scenes a significant amount of effort goes into making sure both those making freedom of information requests and those answering them have a positive experience of the site. While the site is almost entirely automated sometimes human involvement is necessary. This article highlights those key “edge cases” which are dealt with by the staff and volunteers who make up the WhatDoTheyKnow team.
In the last year 15,233 freedom of information requests have been made via WhatDoTheyKnow.
444 messages on 360 requests (2.3%) had to be manually placed on the correct request as a result of authorities not sending replies to the email address given. The errors are introduced as authorities apparently manually transcribe email addresses from incoming email into correspondence management systems. There have been suggestions some may even print out and scan-in emails into such systems. WhatDoTheyKnow’s code has been improved in light of experience, common errors are now detected automatically and in many cases the system suggests which request the message was intended to be directed to.
In terms of outgoing messages just 52 (0.3%) requests over the course of the year were marked as receiving an error message in response and users marked 94 (0.6%) as requiring administrator attention. These are generally either transient errors which simply require a message to be resent or prompt us to check and update the contact details we hold for a particular organisation. Regularly there are problems with authority’s spam filters and we have to encourage them to change the way their filters are set up to allow messages from WhatDoTheyKnow.com through.
119 (0.8%) requests were at some point marked as “Handled by Post”. In many of these cases users eventually persuaded authorities to release the information in electronic form. Where information is supplied outside the site users can add annotations describing the information released, then can link to copies of the data they have posted online, or as has been done in respect of 14 requests (0.1% of the total, 11% of those handled by post) they can supply the information to WhatDoTheyKnow to upload manually. When the site was being designed there was a worry that authorities would reply to many requests by post. This has not occurred, in part perhaps because the freedom of information act contains a provision (section 11) requiring the requestor’s preferred means of communication to be used where it is reasonable. A requestor using an @whatdotheyknow email address is clearly expressing a preference for a reply to be made electronically via the site.
One of the major challenges facing the site is keeping it operating in the face of the UK’s libel laws. Unlike in other countries, such as the US, we cannot publish statements on our users’ behalf without taking the risk of being sued for libel ourselves. Even simply republishing FOI responses from public authorities is not without risk in the UK. While we don’t actively police the site a lot of administrator time is taken up dealing with cases where potentially libelous or defamatory comments have been brought to our attention. Cases can be very complicated and involve a great deal of correspondence. mySociety is lucky to have the services of a specialist internet and technology barrister with expertise in libel who provides his services free of charge. We try and act in such a way as to maximise transparency while ensuring that the existence of WhatDoTheyKnow and mySociety are not threatened by legal risks.
In the last year there have been only seven significant cases where requests have been hidden from public view on the site due to concerns relating to potential libel and defamation. Three of those cases have involved groups of twenty or so requests made by the same one or two users. While actual number of requests we have had to hide is around 70 (0.4% of the total) even this small fraction overstates the situation due to the repetition of the same potentially libellous accusations and comments in different requests. In all cases we have kept as much information up on the site as possible. Our policy with respect to all requests to remove information from the site is that we only take down information in exceptional circumstances; generally only when the law requires us to do so.
Sometimes people accidentally post personal information to the site; for example they make a request which is not a Freedom of Information request but a subject access request under the Data Protection Act. We are happy to remove such requests. On occasion we get requests from both our users and public sector employees asking us to remove their names from the site. As we are trying to build up a FOI archive we are very reluctant to remove information from the site, our policy is only to remove names in exceptional circumstances. Often information, such as an out of office reply, which a public body or civil servant considers irrelevant and asks to be removed is in fact critical to the correspondence thread and timeline of a response.
Copyright and Control of Information Released
The fact information is subject to copyright and restrictions on re-use does not exempt it from disclosure under the Freedom of Information Act (though there is a closely related exemption relating to “commercial interest”). Occasionally public bodies will offer to reply to a request, but in order to deter wider dissemination of the material they will refuse to reply via WhatDoTheyKnow.com. Southampton University have released information in protected PDF documents and the House of Commons has refused to release information via WhatDoTheyKnow.com which it has said it would be prepared to send to an individual directly.
Mantaining and Expanding The List of Authorities
WhatDoTheyKnow lists around 3,000 public authorities, there is a regular turnover of changes in contact details. Our coverage, while large, is not comprehensive so we have requests to add bodies such as parish councils, schools, and doctors surgeries which we have not yet attempted to add in a systemic manner based on official sources of information.
We have also had to carefully consider what we do when for handling the various situations where an authority becomes defunct and its responsibilities are taken over by another body for example as a result of reorganisations of local government and the creation and merging of government departments.
Providing Advice and Assistance
The team at WhatDoTheyKnow.com often provide advice to users. We encourage users to keep their requests focused so as to reduce the chance of any problems due to libel or requests being classed vexatious. On occasion we suggest appropriate authorities for users to direct requests to, provide advice to those unhappy with the response to their request, and answer a broad range of other queries as they arise such as if particular bodies are subject to the act or not. Increasingly we link to authority’s publication schemes which are intended to let people know what information an authority has and how it can be accessed.
Lastly, like all websites which allow people to post content online WhatDoTheyKnow.com occasionally suffers from spam in various forms. Most is dealt with automatically but some has to be removed by hand. With spam, like the other aspects of running the site, the site’s code and processes are constantly being developed and improved to reduce the fraction of cases requiring any manual intervention.
This article was prompted in part by a team in New Zealand considering launching their own version on the site asking us what’s involved.
Statistics were recently released on the performance of UK central government departments with respect to their handling of freedom of information requests. The latest figures are for the second quarter of 2009. We have been able to use these to calculate the fraction of all requests which are made via mySociety’s freedom of information website WhatDoTheyKnow.com.
- 13.1% of all FOI requests to “Departments of State” in the second quarter of 2009 were made via WhatDoTheyKnow.com. In absolute terms this was 753 out of 5769 requests; this is up from 8.5% in the first quarter of 2009.
- 32.3% of FOI requests to the Home Office (which includes the UKBA and the IPS) were made via WhatDoTheyKnow in the second quarter of 2009. In absolute terms this was 206 out of 638 requests.
- The latest figures show that in twelve of the UK’s twenty-one Departments of State more than 10% of FOI requests were made via WhatDoTheyKnow.
What these statistics mean is that an ever increasing fraction of the information released in response to freedom of information requests is being archived and made publicly available by WhatDoTheyKnow.com. Hopefully this will reduce the number of duplicate requests being submitted and ensure the information released is made available to the widest possible audience which in-turn should increase the chances it is acted on.
Only forty-three central government bodies have their freedom of information performance monitored centrally. This is a tiny fraction of the three thousand or so bodies currently listed by WhatDoTheyKnow.
On Saturday John Cross and Richard Taylor, two volunteers who work on mySociety’s freedom of information website WhatDoTheyKnow.com, gave a workshop on FOI to a meeting of activists from Republic, an organisation which campaigns for an elected head of state in the UK.
mySociety and WhatDoTheyKnow are non-partisan and don’t get involved in campaigning except in specific areas relating to openness and transparency. That said, members of the WhatDoTheyKnow team are be happy to consider invitations from any groups wishing to hold a workshop discussing freedom of information.
Many of those present at Saturday’s event were active campaigners on a wide range of subjects ranging from human rights to fair trade as well as having an interest in constitutional reform. The FOI workshop was oversubscribed with the majority of those present at the event deciding to attend the session. Unlike a previous workshop held at OpenTech where most attendees had made an FOI request themselves prior to the event, at this workshop all but one had not done so.
The Royals and FOI
Given the audience, the status of the royals with respect to FOI was particularly pertinent. The FOI act exempts information if it relates to: “communications with Her Majesty, with other members of the Royal Family or with the Royal Household, or the conferring by the Crown of any honour or dignity”. This exemption does not apply though if it is determined that it is in the public interest for the information to be released. The requirement for this public interest test is under threat as the Prime Minister has been moving to strengthen the restrictions on releasing information related to the Royal family. On the 10th of June 2009 in a speech to Parliament on Constitutional Renewal Gordon Brown said:
…we have considered the need to strengthen protection for particularly sensitive material, and there will be protection of royal family and Cabinet papers as part of strictly limited exemptions.
Following that speech BBC journalist Martin Rosenbaum obtained a statement from the Ministry of Justice clarifying that in practice what Gordon Brown’s words meant was:
… the relevant exemption in the Freedom of Information Act will be made absolute for information relating to communications with the Royal Household that is less than 20 years’ old.
In FOI jargon an “absolute exemption” is one not subject to a public interest test.
Even with the law as it stands it is not easy to obtain information on how the royals are, or are attempting to, influence government. For example John Cross has asked the Ministry of Justice to supply him with copies of correspondence they had received from the Queen and Prince of Wales. They rejected his request on the grounds that the public interest in non-disclosure exceeded the public interested in disclosure; as well as suggesting exemptions relating to “information provided in confidence” and “personal information” also applied.
The Royal Household’s position on FOI
The Royal Household is not subject to the freedom of information act; though it has made a statement on the subject saying:
Despite its exemption from the FOI Acts, the Royal Household’s policy is to provide information as freely as possible in other areas, and to account openly for its use of public money.
WhatDoTheyKnow’s policy is to include such organisations which have indicated they are willing to voluntarily comply with the act to the site. While we list The Royal Household, at the time of writing no-one has yet used the facility to request information.
Using WhatDoTheyKnow for Campaigning
While we stress the importance of keeping freedom of information requests focused, FOI is a powerful tool for campaigners. We were asked if it would be possible for a group like Republic to set up an account on WhatDoTheyKnow for their campaign? The answer to this is: “Yes! – WhatDoTheyKnow wants to encourage groups to use the site”. The information commissioner has confirmed that it is acceptable to use the name of a “corporate body” when making a FOI request, that’s a broad term which encompasses many organisations, groups and charities.
Republic themselves use FOI extensively and often generate major national news stories as a result of responses to their requests. They want to be able to either offer journalists exclusive stories or write a press release based on information released. They can’t do this if the story gets out first via WhatDoTheyKnow so would be interested in an ability to make requests initially in private. mySociety and WhatDoTheyKnow have been considering an option for journalists to be able to make hidden requests via the site. Such a feature could potentially generate an income stream for the site as well as encourage a greater proportion of FOI requests to be made via it. Once the article had been published then the FOI correspondence could be opened up to the public providing access to the source material backing up the story.
As well as meeting those who use, or might want to use, the site to make requests WhatDoTheyKnow also wants to engage positively with public authorities; we see them as important users of our service too. Developer Francis Irving represented the site at the FOI Live conference for information professionals in June and will be speaking at the Freedom of Information Scotland conference in December.
Last week a user of mySociety’s Freedom of Information website WhatDoTheyKnow.com made a request for the release of the results of research into pollutants and urban greenspace in London which had been carried out by The Forestry Commission. Despite this work having been led by the government department responsible for the UK’s woodlands, carried out in collaboration with UK universities, and largely funded by public money distributed via the Engineering and Physical Sciences Research Council the results of the research were not freely accessible. The user was referred to an academic paper entitled An integrated tool to assess the role of new planting in PM10 capture and the human health benefits: A case study in London which has been published in the October 2009 edition of Elsevier Ltd’s Environmental Pollution journal. The publishing company are currently offering a A PDF version of the publication for $31.50via their website.
Exemptions Applicable to Research
In terms of the freedom of information act there are a number of provisions which can be used to exempt the output of publicly funded scientific research:
- Section 22 of the act excludes “Information intended for future publication”, a large fraction of research cumulates in the publication of an academic paper so comes into this category.
- Section 21 excludes “Information accessible to applicant by other means.” This means that once research work has been published a requestor can merely be directed to the publication. Section 21(2)a of the act makes clear “information may be [considered] reasonably accessible to the applicant even though it is accessible only on payment.”
With the above exemptions in mind it might well be possible to phrase requests in such a way that they don’t apply. For example I have had some limited success in relation to a request for a research protocol.
First Come First Served?
Our user was offered a hard copy of the publication; the reason this request was drawn to our attention was that the team to was contacted to help the two parties to get in touch directly. I suspect the reason that an electronic copy of the document was not supplied via WhatDoTheyKnow may have been related to a concern over breach of copyright on the research results which has probably been transferred or licensed to the publisher. While one individual may have obtained a copy of the information, it is still not accessible to everyone. Tony Hutchings, the Forestry Commission’s Head of Land Regeneration and Urban Greenspace, who led the research told me: “We have prints of the paper which we could supply you with”. How many printed copies he has to distribute and what happens when he runs out is not clear.
Open Access Publishing
Ideally the results of publicly funded scientific research ought be published in an unrestricted format in open access journals. The UK government, is moving towards such a stance but at a painfully slow pace. I asked the author of the research why he had taken the decision not to publish in a more accessible journal. He responded by saying:
The Research Councils (as do many funders from both private industry and public bodies) assess the quality of the research undertaken by the impact factors of the papers produced. … To my knowledge there are unfortunately few open access journals with high impact factors.
The EPSRC who funded his research have a Policy on Access to Research Outputs which states: “knowledge derived from publicly-funded research must be made available and accessible for public use”. When I asked them for a comment on this particular case Dr Sue Smart their Head of Performance and Evaluation responded saying: “Tony Hutchings is mistaken in his assertion that we use journal impact factors in assessing the quality of research”, but she also ruled his offer of a paper copy of the research article was: “in keeping with the principles of the RCUK (Research Councils UK) position statement [on access to research outputs].”
Like the other volunteers who help out with WhatDoTheyKnow.com I use the site for my own activism and campaign independently for more openness and transparency in a range of areas. I have written an extended article on my own website on the subject of open access publishing where I have included more details of the responses from the research council and researcher quoted above.
There’s round about 8Gb of unfettered Government data in the core database, plus a whole bunch more for indexing and caching. For comparison, TheyWorkForYou (which now goes back to 1935) has 12Gb. And it’s catching up on traffic also – WhatDoTheyKnow has about half the number of visitors as TheyWorkForYou.
Unfortunately, this new found traffic has led to performance problems. You might have seen errors when using WhatDoTheyKnow in the last week or two. This post is firstly an apology for that. Thank you for your patience. Hopefully it is fixed now – do let us know if you get problems still. And secondly it is some techy stuff about debugging such problems in Ruby on Rails…
When WhatDoTheyKnow started failing, we did the obvious things to start with – moving the database to a separate server, and moving some other services off the same server, to give WDTK more room to breathe. It still kept breaking.
None of my server monitoring tools shed any very clear light as to the problem. I upgraded to the latest version of Passenger, the best Rails deployment tool I’ve seen yet. It’s pretty good, but still not mature enough for my liking. I was still getting the same problems with it, but reporting tools like passenger-memory-stats were really helpful.
Eventually I worked out that it was to do with memory use of the Rails processes. Individual ones would leap up to 1Gb, and never drop back down. If several did, the server (with 4Gb of RAM) would start swapping and grind to a halt. The world of Ruby and Rails memory monitoring software is patchwork at best, and in the end I found the simplest tools the most useful. Here’s some:
- I found some Rails processes were getting jammed, and not dieing even when I restarted Apache. I think in the end this was due to the Passenger spawning method, and our use of the Xapian Ruby module. Running Passenger in RailsSpawnMethod conservative mode made things much more robust.
- Monit, which in a previous life had a job holding up vital structural pillars of buildings with duct tape, makes you feel dirty. Actually it is really useful. Given I couldn’t quickly fix the problem, Monit let me at least reduce the suffering for people trying to use the site meanwhile. Here’s the rule I used, which gives Apache a kick every time server memory use is too high. It was firing every 5 or 10 minutes…
check system localhost if memory > 3500 MB then exec "/usr/sbin/apache2ctl graceful"
- I found memory_profiler on a blog. It helps you find the kind of memory leak where you unintentionally continue to reference an object you don’t use any more. With a specialist subject of string objects. This led to a fix to do with declaring static arrays in classes vs. modules, which I still don’t really understand. But it wasn’t the cause of the big 1Gb memory munching, there were no large enough leaks of this sort.
- The record_memory function in WDTK’s application controller came from another blog. It’s handy as it shows you how much of the system memory in the Ruby process each request causes an increase by. With caveats, this was the best way for me to identify the most damaging requests (search results, and certain public body pages). And it also brought focus on the actual problem – the peak memory use during a request. That’s really important, because Ruby’s memory manager never returns memory to the operating system… The Gb leaps in memory use were because of temporary memory used during certain requests, which the Ruby memory manager then never frees later.
- I made a bunch of functions culminating in allocated_string_size_around_gc. This was really useful in use with the “just add lots of print statements and fiddle” school of debugging. Not everyone’s favourite school, but if your test code can’t catch it, one I often end up using (it gets really involved rarely enough that it doesn’t seem worth setting up an interactive debugger). It led me to various peak memory savings, such as calling “text.gsub!” rather than “text = text.gsub” while removing (email addresses and private information) from FOI request responses, which help quite a bit when dealing with multi-megabyte attachments.
- Finally, I used the overlooked debugging tool, and the one you should never rely on, being common sense. That is, common sense informed by days of careful use of all the other tools. In order to quickly show text extracts when searching, WDTK stores the extracted attachment text in the database. A few of these attachments are quite large, and led to 50Mb fields, often several of which were being loaded and processed in one page request. That this would cause a high peak of memory use all became just obvious to me some time yesterday. I checked that that was the case, and this morning, I changed it to use the full text for indexing, but to at most keep 1Mb for use in snippets. So sometimes now you won’t get a good search extract for queries, but it is rare, and it will at least still return the right result.
I’ve more work to do, I think there are quite a few other quick wins, all of which are making the site faster too. I’m quite happy that WhatDoTheyKnow also has a bunch more test code as a result of all this.
On the other hand, what a disappointing disaster for open source languages beginning with P/R (as opposed to J). Yes, the help and tools were just about there to work it out, but would seem primitive if you’d used say Java’s Memory Analyzer. Indeed somebody over on StackOverflow suggested running your site in JRuby and using exactly that tool…
Note: This post is a work in progress, I need your help to improve it, especially with knowledge of non-English sites
I was recently in Washington DC catching up with mySociety’s soul-mates at the Sunlight Foundation. As we talked about what was going on in the field of internet-enabled transparency, it came clear to me that there are now more identifiable categories of transparency website than there used to be.
Identifying and categorising these types of site turns out to be surprisingly useful. First, it can help people ask “Why don’t we have anyone doing that in our country?” Second, it can help mySociety to make sure that when we’re planning ahead we don’t fail to consider certain options that be currently off our radar. Also, it gives me an excuse to tell you about some sites that you may not have seen before.
Anyway, enough preamble. Here they are as I see them – please give me more suggestions as you find them. As you can see there’s a lot more activity in some fields than others.
1. Transparency blogs & newspapers – At the technically simplest, but most manual labour-intensive end of the scale is sites, commercial and volunteer driven, whose owners use transparency to help them to write stories. Given almost every political blog does this a bit, it can be hard to name specific examples, but I will note that Heather Brooke is the UK’s pre-eminent FOI-toting journalist/blogger, and we’ve just opened a blog for our awesome volunteers on WhatDoTheyKnow to show their FOI skills to an as-yet unsuspecting public.
2. What Politicians do in their parliaments – These sites primarily include lists of politicians, and information about their primary activities in their assemblies, such as voting or speaking. This encompasses mySociety’s TheyWorkForYou.com, Rob McKinnon’s one man labour of love TheyWorkForYou in NZ, Italy’s uber-deep OpenPolis.it (6 layers of government, anyone?), Germany’s almost-un-typable Abgeordnetenwatch, Romania’s writ-wielding IPP.ro, Josh Tauberer’sGovTrack.us, plus the bonny bouncing babies OpenAustralia and Kildare Street (Ireland). Of special note here are Mzalendo (Kenya) who unlike everyone else, can’t reply on access to a parliamentary website to scrape raw data from, and Julian Todd’s UNDemocracy (International), that has to fight incredible technical barriers to get the information out.
3. Databases of questions and answers posed to politicians – These sites let people post politicians questions, and the publish the questions and answers. The Germans running Abgeordnetenwatch (Parliament Watch) seem to have had considerable success here, with newspapers citing what politicians say on their site. Yoosk has some politicians in the UK on it, too.
4. Money in politics – This comes in two forms, money given to candidates (MAPlight), and money bunged by politicians to their favourite causes (Earmark watch). In the UK, as far as I know, the Electoral Commission’s database remains currently unscraped, perhaps because the data is so ungranular.
6. Websites containing bills going through parliament, or the law as voted on – This includes the increasingly substantial OpenCongress in the US which saw major traffic during the Health Care debates, and the UK government’s own Acts database and Statute Law Database. Much of the legal database field, however, remains essentially private.
7. Services that create transparency as a side effect of delivering services – Our own sites lead the way here: FixMyStreet‘s public problem reports and WhatDoTheyKnow’s FOI archive are both created by people who aren’t primarily using the site to enrich it – they’re using it to get some other service.
8. Election websites – These come in many forms, but what they have in common is their desire to shed light on the positions and histories of candidates, whether incumbents or new comers. The biggest beast here is Stemwijzer (Netherlands), probably in relative terms the most used transparency or democracy site ever. However these sites are popular in several places, the big but highly labour intensive VoteSmart (US), Smartvote.ch (Switzerland), plus others. mySociety is shortly to start to recruit constituency volunteers to help with our take on this problem, keep an eye on this blog if you want to know more.
9. Political document archives - This is a new category, now occupied by Sunlight’s Partytime archive for invitation to political events, and TheStraightChoice, Julian Todd and Richard Pope’s wonderful new initiative for archiving election leaflets and other paper propoganda.
10. Bulk data - Online transparency pioneer Carl Malamud doesn’t do sites, he does data. Big globs zipped up and made publicly available for coders and researchers to download and process. The US government has now stepped into this field itself with Data.gov, doubtless soon to be followed by data.gov.uk.
Please don’t shoot me if I’ve missed anything here, the world is a big place. But I thought that was a useful and interesting exercise, and I hope you’ll both find it useful, and help me improve it too. Comment away.