Private data, containing personal details of the general public, is accidentally released by public authorities at least once a fortnight, say mySociety.
The volunteer team behind WhatDoTheyKnow, mySociety’s freedom of information website, have dealt with 154 accidental data leaks made by bodies such as councils, government departments and other public authorities since 2009, and these are likely to represent only the tip of the iceberg.
On the basis of this evidence, we are again issuing an urgent call for public authorities everywhere to tighten up their procedures.
How WhatDoTheyKnow works
Under the Freedom of Information act, anyone in the UK may request information from a public body.
WhatDoTheyKnow makes the process of filing an FOI request very easy: users can do so online. The site publishes the requests and their responses, creating a public archive of information.
Public authorities operate under a code of conduct that requires personal information is removed or anonymised before data is released: for example, while a request for the number of people on a council housing waiting list may be calculated from a list including names, addresses and the reason for housing need, the information provided should not include those details.
Accidental data releases become particularly problematic when the data requested concerns the details of potentially vulnerable people.
Hidden data is not always hidden
When users request information through WhatDoTheyKnow, it’s often provided in the form of an Excel spreadsheet. But unfortunately, private data is sometimes included on those spreadsheets, usually because the staff member who provides it doesn’t understand how to anonymise it effectively.
For example, data which is in hidden tabs, or pivot tables, can be revealed by anyone who has basic spreadsheet knowledge, with just a couple of clicks.
By its very nature, data held by our public authorities can be extremely sensitive: imagine, for example, lists of people on a child protection register, lists of people who receive benefits, or as happened back in 2012, a list of all council housing applicants, including each person’s name and sexuality.
Our latest warning is triggered by an incident earlier this month, in which Northamptonshire County Council accidentally published data on over 1,400 children, including their names, addresses, religion and SEN status. Thanks to the exceptionally fast work of both the requester and the WhatDoTheyKnow volunteers, it was removed within just a few hours of publication, and the incident has been reported to the Information Commissioner’s Office. Concerned residents should contact the ICO or the council itself.
Advice for FOI officers
Back in June 2013, we set out the advice that we think every FOI officer should know. That advice still stands:
- Don’t release Excel pivot tables created from spreadsheets containing personal information, as the source data is likely to be still present in the Excel file.
- Ensure those within an organisation who are responsible for anonymising data for release have the technical competence to fulfil their roles.
- Check the file sizes. If a file is a lot bigger than it ought to be, it could be that there are thousands of rows of data still present in it that you don’t want to release.
- Consider preparing information in a plain text format, eg. CSV, so you can review the contents of the file before release.
Part of a larger picture
Not every FOI request is made through WhatDoTheyKnow—many people will send their requests directly to the public authority. Moreover, we can only react to the breaches that we are aware of: there are, in all probability, far more which remain undiscovered.
But because of WhatDoTheyKnow’s policy of making information accessible to all, by publishing it on the site, it’s now possible to see what an endemic problem this kind of treatment of personal data is.
When we come across incidents like these, we act very rapidly to remove the personal information. We then inform the public authority who provided the response. We encourage them to self-report to the Information Commissioner’s Office, and where the data loss is very serious, we may make an additional report ourselves.
This is a problem we have been warning about for some time. Islington Council were fined £70,000 for a similar incident in 2012. In light of this fresh incident we again urge all public authorities to take care when preparing data for release.
As with the Islington incident, the information was in parts of an Excel spreadsheet that were not immediately visible. It was automatically published on 14th November when Hackney Council sent it in response to a Freedom of Information request, as part of the normal operation of the WhatDoTheyKnow website. All requests sent via the website make it clear that this will happen.
This particular breach involved a new kind of hidden information we hadn’t seen before – the released spreadsheet had previously been linked to another spreadsheet containing the private information, and the private information had been cached in the “Named Range” data in the released spreadsheet.
Although it was not straightforward to access the information directly using Excel, it was directly visible using other Windows programs such as Notepad. It had also been indexed by Google and some of it was displayed in their search previews.
The breach was first hit upon by one of the data subjects searching for their own name. When they contacted us on 25th November to ask about this, one of our volunteers, Richard, realised what had happened. He immediately hid the information from public view and notified the council.
We did not receive any substantive response from the council and therefore contacted them again on 3rd December. The council had investigated the original report but not understood the problem, and were in fact preparing to send a new copy of the information to the WhatDoTheyKnow site, which would have caused the breach to be repeated.
We reiterated what we had found and advised them to consult with IT experts within their organisation. The next day, 4th December, we sent them a further notification of what had happened, copying the Information Commissioner’s Office (ICO). As far as we are aware, this was the first time the ICO was informed of the breach.
From our point of view it is very disappointing that these incidents are still happening. Freedom of Information requests made via WhatDoTheyKnow are a small fraction of all requests, so it is very likely that this kind of error happens many more times in private responses to requesters, without the public authority ever becoming aware.
Our earlier blog post has several tips for avoiding this problem. These tips include using CSV format to release spreadsheets, and checking that file sizes are consistent with the intended release. Either of these approaches would have averted this particular breach.
We would also urge the ICO to do as much as possible to educate authorities about this issue.
Simply Secure is a new organisation, dedicated to finding ways to improve online security – in ways so accessible and useful that there will be no barrier to their use.
It will bring together developers, UX experts, researchers, designers and, crucially, end users. The plan is to ensure the availability of security and privacy tools that aren’t just robust – they’ll be actively pleasing to use.
Now, you may be thinking that online privacy and security aren’t the most fascinating subject – but this month, the chances are that you’ve actually been discussing it down the pub or with your Facebook friends.
Remember the iCloud story, where celebrities’ personal photographs were taken from supposedly secure cloud storage and put online? Yes, that. If you uttered an opinion about how those celebrities could have kept their images more safely, you’ve been nattering about online security.
Simply Secure is founded on the belief that we’d all like privacy and security online, but that up until now, solutions have been too cumbersome and not user-centred enough. When implementing them becomes a hassle, even technically-literate people will choose usability over security.
How we helped
Simply Secure knew what their proposition was: now we needed to package this up into a brand for them. Crucially, it needed to transmit a playful yet serious message to launch the organisation to the world – within just four weeks.
Our designer Martin developed all the necessary branding and illustration. He created a look and feel that would be carried across not just Simply Secure’s website, but into the real world, on stickers and decoration for the launch event.
Meanwhile, mySociety Senior Consultant Mike helped with content, page layout and structure, all optimised to speak directly to key audience groups.
Down at the coding end of things, our developer Liz ensured that we handed over a project that could be maintained with little to no cost or effort, and extended as the organisation’s purpose evolves.
“mySociety are brilliant to work with. They did in a month what I’ve seen others do in six, and they did it better” – Sara “Scout” Sinclair Brody, Simply Secure
What did the client think? In their own words: “We approached [mySociety] with a rush job to build a site for a complex and new effort.
“They were able to distill meaning from our shaky and stippled examples, and create something that demonstrated skill not only as designers and web architects, but as people able to grasp nuanced and complicated concepts and turn those into workable, representative interfaces”.
Always good to hear!
People who know mySociety’s work might have noticed that we don’t typically work on purely content-driven sites. Generally we opt to focus on making interactions simple, and data engaging, so why did we go ahead with the Simply Secure project?
Well, there were a couple of factors. Firstly, we genuinely think that this will become an invaluable service for every user of the internet, and as an organisation which puts usability above all else, we wanted to be involved.
Second, we believe in the people behind the project. Some of them are friends of mySociety’s, going back some time, and we feel pretty confident that any project they’re involved in will do good things, resulting in a more secure internet for everyone.
Take a look
Simply Secure launches today. We’ll be checking back in a couple of months to report on how it’s going.
You may have heard that a widespread security problem – ‘Heartbleed’ – has been found that affects a large proportion of all websites on the Internet.
Here is one of the many explanations about the nature of the problem.
Members of the mySociety team have reviewed our potential exposure to the vulnerability.
We have no indication that our sites have been attacked, or that any information has been stolen, but the nature of the vulnerability would make an attack difficult to detect, and we prefer to be reasonably cautious.
What does this mean for you? The advice from around the web has been for people to change passwords, especially on sites they use that contain a lot of very important information (e.g. your email account).
We think the risk that passwords have been compromised is low, but as changing passwords occasionally is always a good idea anyway, now might be a good time.
For those of you interested in the technical detail of our response, we have:
- Upgraded the SSL software
- Installed new SSL certificates based on a new private key
- Revoked the old SSL certificates
- Replaced the secrets used for security purposes in the affected sites
- Removed active sessions on affected sites, so that users will need to log in again
- Required that users with administrative access to affected sites reset their passwords
- Required that staff users reset their passwords
- Notified affected commercial clients so that they can take appropriate action
The local press in Islington has just reported the accidental release of quite a bit of sensitive personal data by Islington council.
One of our volunteers, Helen, was responsible for spotting that Islington had made this mistake, and so we feel it is appropriate to set out a summary of what happened, to inform journalists and citizens who may be interested.
On 27th May a user of our WhatDoTheyKnow website raised an FOI request to Islington Borough Council. On the 26th June the council responded to the FOI request by sending three Excel workbooks. Unfortunately, these contained a considerable amount of accidentally released, private data about Islington residents. In one file the personal data was contained within a normal spreadsheet, in the two other workbooks the personal data was contained on four hidden sheets.
All requests and responses sent via WhatDoTheyKnow are automatically published online without any human intervention – this is the key feature that makes this site both valuable and popular. So these Excel workbooks went instantly onto the public web, where they seem to have attracted little attention – our logs suggest 7 downloads in total.
Shortly after sending out these files, someone within the the council tried to delete the first email using Microsoft Outlook’s ‘recall’ feature. As most readers are probably aware – normal emails sent across the internet cannot be remotely removed using the recall function, so this first mail, containing sensitive information in both plain sight and in (trivially) hidden forms remained online.
Unfortunately, this wasn’t the only mistake on the 26th June. A short while later, the council sent a ‘replacement’ FOI response that still contained a large amount of personal information, this time in the form of hidden Excel tabs. As you can see from this page on the Microsoft site , uncovering such tabs takes seconds, and only basic computer skills.
At no point on or after the 26th June did we receive any notification from Islington (or anyone else) that problematic information had been released not once, but twice, even though all mails sent via WhatDoTheyKnow make it clear that replies are published automatically online. Had we been told we would have been able to remove the information quickly.
It was only by sheer good fortune that our volunteer Helen happened to stumble across these documents some weeks later, and she handled the situation wonderfully, immediately hiding the data, asking Google to clear their cache, and alerting the rest of mySociety to the situation. This happened on the 14th July, a Saturday, and over the weekend mySociety staff, volunteers and trustees swung into action to formulate a plan.
The next working day, Monday 16th July, we alerted both Islington and the ICO about what had happened with an extremely detailed timeline.
The personal data released by Islington Borough Council relates to 2,376 individuals/families who have made applications for council housing or are council tenants, and includes everything from name to sexuality. It is for the ICO, not mySociety, to evaluate what sort of harm may have resulted from this release, but we felt it was important to be clear about the details of this incident.
Today we have a strange story about a department that appears to think that it has a duty not to release information under FOI if it makes people angry.
It all starts in January 2009 the Department for Children, Schools and Families (DCSF) appointed an expert by the name of Graham Badman to conduct a review of elective home education in England. It probably goes without saying that this is an issue far from our concerns, and an issue that mySociety has no views on – what makes us interested is the process that followed.
Shortly after the publication of the report, Elaine Walton, a user of mySociety’s freedom of information website WhatDoTheyKnow.com requested copies of communications between the Department for Children, Schools and Families and Nektus Ltd. the company through which it appears Mr Badman was paid for his work.
According to email replies to Ms Walton, the DCSF located two relevant invoices which show how much money was paid, but refused to disclose them. Strangely, though, they were not refused on grounds of commercial confidentiality, but rather on something more unusual. Here are the exemptions they cited:
- Section 40 – Personal Information
- Section 38 – Health and safety
Health and Safety? A little investigation reveals more.
When Ms Walton appealed against this decision, an internal review was carried out within the DSCF. The internal review’s findings stated that Mr Badman was likely become a victim of harassment if certain personal details were made public, hence a health and safety concern, and hence no publication of these invoices. Fair enough – nobody would be in favour of revealing private, sensitive information that would endanger anyone’s life or family, especially in the presence of a known threat. But take a look at this:
“That the Department had initially been drafting a response that included the release of invoices with only personal data redacted. But before the draft was complete it was apparent that there was a campaign of harassment and vilification against Graham Badman and other individuals/organisations that had contributed to the Report. In the light of this, at the weekly review meeting of FOI cases, it was considered that the balance of public interest might have shifted towards withholding.”
What is very curious here is the admission that the department had been thinking of releasing the invoices with personal data hidden (ie no home address, bank details etc). But then because of a campaign of harassment, it was decided that they wouldn’t publish anything at all. So not just no personal information, but no dates, no amounts of money, nothing.
What is so unease-making about this FOI decision is that it appears to be saying that departments may conceal information on how much public money has been spent on something because releasing that information will make some angry people even angrier. Surely this can’t be right – if it were every budget would be conducted in complete secrecy. We would encourage the Information Commissioner’s Office to take a look.
mySociety’s Freedom of Information website WhatDoTheyKnow is designed to appear simple and straightforward to users. That appearance belies the fact that behind the scenes a significant amount of effort goes into making sure both those making freedom of information requests and those answering them have a positive experience of the site. While the site is almost entirely automated sometimes human involvement is necessary. This article highlights those key “edge cases” which are dealt with by the staff and volunteers who make up the WhatDoTheyKnow team.
In the last year 15,233 freedom of information requests have been made via WhatDoTheyKnow.
444 messages on 360 requests (2.3%) had to be manually placed on the correct request as a result of authorities not sending replies to the email address given. The errors are introduced as authorities apparently manually transcribe email addresses from incoming email into correspondence management systems. There have been suggestions some may even print out and scan-in emails into such systems. WhatDoTheyKnow’s code has been improved in light of experience, common errors are now detected automatically and in many cases the system suggests which request the message was intended to be directed to.
In terms of outgoing messages just 52 (0.3%) requests over the course of the year were marked as receiving an error message in response and users marked 94 (0.6%) as requiring administrator attention. These are generally either transient errors which simply require a message to be resent or prompt us to check and update the contact details we hold for a particular organisation. Regularly there are problems with authority’s spam filters and we have to encourage them to change the way their filters are set up to allow messages from WhatDoTheyKnow.com through.
119 (0.8%) requests were at some point marked as “Handled by Post”. In many of these cases users eventually persuaded authorities to release the information in electronic form. Where information is supplied outside the site users can add annotations describing the information released, then can link to copies of the data they have posted online, or as has been done in respect of 14 requests (0.1% of the total, 11% of those handled by post) they can supply the information to WhatDoTheyKnow to upload manually. When the site was being designed there was a worry that authorities would reply to many requests by post. This has not occurred, in part perhaps because the freedom of information act contains a provision (section 11) requiring the requestor’s preferred means of communication to be used where it is reasonable. A requestor using an @whatdotheyknow email address is clearly expressing a preference for a reply to be made electronically via the site.
One of the major challenges facing the site is keeping it operating in the face of the UK’s libel laws. Unlike in other countries, such as the US, we cannot publish statements on our users’ behalf without taking the risk of being sued for libel ourselves. Even simply republishing FOI responses from public authorities is not without risk in the UK. While we don’t actively police the site a lot of administrator time is taken up dealing with cases where potentially libelous or defamatory comments have been brought to our attention. Cases can be very complicated and involve a great deal of correspondence. mySociety is lucky to have the services of a specialist internet and technology barrister with expertise in libel who provides his services free of charge. We try and act in such a way as to maximise transparency while ensuring that the existence of WhatDoTheyKnow and mySociety are not threatened by legal risks.
In the last year there have been only seven significant cases where requests have been hidden from public view on the site due to concerns relating to potential libel and defamation. Three of those cases have involved groups of twenty or so requests made by the same one or two users. While actual number of requests we have had to hide is around 70 (0.4% of the total) even this small fraction overstates the situation due to the repetition of the same potentially libellous accusations and comments in different requests. In all cases we have kept as much information up on the site as possible. Our policy with respect to all requests to remove information from the site is that we only take down information in exceptional circumstances; generally only when the law requires us to do so.
Sometimes people accidentally post personal information to the site; for example they make a request which is not a Freedom of Information request but a subject access request under the Data Protection Act. We are happy to remove such requests. On occasion we get requests from both our users and public sector employees asking us to remove their names from the site. As we are trying to build up a FOI archive we are very reluctant to remove information from the site, our policy is only to remove names in exceptional circumstances. Often information, such as an out of office reply, which a public body or civil servant considers irrelevant and asks to be removed is in fact critical to the correspondence thread and timeline of a response.
Copyright and Control of Information Released
The fact information is subject to copyright and restrictions on re-use does not exempt it from disclosure under the Freedom of Information Act (though there is a closely related exemption relating to “commercial interest”). Occasionally public bodies will offer to reply to a request, but in order to deter wider dissemination of the material they will refuse to reply via WhatDoTheyKnow.com. Southampton University have released information in protected PDF documents and the House of Commons has refused to release information via WhatDoTheyKnow.com which it has said it would be prepared to send to an individual directly.
Mantaining and Expanding The List of Authorities
WhatDoTheyKnow lists around 3,000 public authorities, there is a regular turnover of changes in contact details. Our coverage, while large, is not comprehensive so we have requests to add bodies such as parish councils, schools, and doctors surgeries which we have not yet attempted to add in a systemic manner based on official sources of information.
We have also had to carefully consider what we do when for handling the various situations where an authority becomes defunct and its responsibilities are taken over by another body for example as a result of reorganisations of local government and the creation and merging of government departments.
Providing Advice and Assistance
The team at WhatDoTheyKnow.com often provide advice to users. We encourage users to keep their requests focused so as to reduce the chance of any problems due to libel or requests being classed vexatious. On occasion we suggest appropriate authorities for users to direct requests to, provide advice to those unhappy with the response to their request, and answer a broad range of other queries as they arise such as if particular bodies are subject to the act or not. Increasingly we link to authority’s publication schemes which are intended to let people know what information an authority has and how it can be accessed.
Lastly, like all websites which allow people to post content online WhatDoTheyKnow.com occasionally suffers from spam in various forms. Most is dealt with automatically but some has to be removed by hand. With spam, like the other aspects of running the site, the site’s code and processes are constantly being developed and improved to reduce the fraction of cases requiring any manual intervention.
This article was prompted in part by a team in New Zealand considering launching their own version on the site asking us what’s involved.
Update: The Telegraph posted a retraction yesterday.
You may have seen coverage on various websites saying that a civil servant was sacked after posting a comment on TheyWorkForYou.
We’ve no idea what this story is about, but we’re pretty certain it has nothing whatsoever to do with TheyWorkForYou. No journalist bothered to contact us before running the story.
- There is no comment on TheyWorkForYou containing the text quoted in that article, nor anything like it, nor has there ever been. Nor in fact (as we’ve checked), on HearFromYourMP, WriteToThem, or WhatDoTheyKnow.
- Only one comment has been left on any contribution by Hazel Blears in 2009, and it’s definitely not related to this.
- 27 comments were left on 13th May, the date the comment was apparently posted; we’ve read them all and they’re all nothing to do with this.
So frankly, we’ve no idea what’s going on.
What we do know is that the implication that mySociety would merrily hand over sensitive personal data that ends up in getting someone sacked, without fighting tooth and nail for their privacy every inch of the way, is a complete misinterpretation of the way we work and the things we hold most dear. No-one has ever contacted us to ask us to hand over such data, nor have we ever done so.
We think what might have happened is a simple mis-remembering of the website that contained the problematic comment. We’re hoping to get in touch with Lisa Greenwood so we can get full details before asking the various media companies that have run with this for a correction.
I’m enjoying the weather at the moment, seems to be sunnier than the summer, but cool with an atmospheric autumnal taste in the air.
mySociety is changing as ever, leaping forward in our race to try and make it easier for normal people to influence, improve or replace functions of government. More on this as it happens.
Meanwhile, I’ve been continuing to hack away at WhatDoTheyKnow. A little while ago Google decided to deep index all our pages – causing specific problems (I had to tell it to stop crawling the 117th page of similar requests to another request), and also ones from the extra attention. There have been quite a few problems to resolve with authority spam filters (see this FOI officer using the annotation function), and with subtle and detailed privacy issues (when does a comment become personal? if you made something public a while ago, and it is now a shared public resource, can you modify it or take it down?).
Right, I’ve got to go and fix a bug to do with the Facebook PledgeBank app. It’s to do with infinite session keys, and how we send messages when a pledge has completed. Facebook seem to change their API without caring much that applications have to be altered to be compatible with it. This is OK if the Facebook application is your core job, but a pain when you just want your Facebook code to keep running as it did forever.
(the autumn photo thanks to Nico Cavallotto)
Much of my August seems to have been absorbed with maintenance tasks.
For example, Chris and I spent a few days tightening up WriteToThem’s privacy. I made sure the privacy statement correctly describes what happens with backup files, and failed messages. I reduced the timeouts on how long we keep the body of failed messages. I made sure we delete old backup files of the WriteToThem database. I wrote scripts to run periodically to check that no bugs in our queueing demon can accidentally mean we keep the body of messages for longer than we say. I added a cron job to delete Apache log files older than a month for all our sites. As AOL know to their cost, the only really private data is deleted data.
Earlier in the month, I handled some WriteToThem support email for the first time in ages. We get a couple of hundred messages a week, which Matthew mainly slogs through. It’s good for morale to do it, as we get quite a lot of praise mail. It is also hard work, as you realise how complicated even our simple site and the Internet are, and it leads to fixing bugs and improving text on the site. I made a few improvements to our administration tools, and things like the auto-responder if people reply to the questionnaire, to try and reduce the amount of support email, and make it easier to handle.
I did some more work on the geographically cascading pledges (like this prototype one), but I’m still not happy with them. In the end, I realised that it is the structure of wording of the pledge that is the key problem. Our format of “If will A but only if N others will B” just isn’t easily adapted to get across that the pledge applies separately in different geographically areas. Working out how to fix that is one of the things we’ll brainstorm about in the Lake District (see below).
The last couple of days I’ve been configuring one of our new servers who is called Balti, and getting the PledgeBank test harness working on it. Until now, it has only been run on my laptop. This is partly heading towards making a proper test harness for the ePetitions site, running on a server so we properly test nothing can be broken before deploying a new version.
Matthew has wrapped up the TheyWorkForYou API now, and is working on Neighbourhood Fixit next. Chris has been doing lots more performance work for the e
Tom’s in Berlin at the moment, he gave a talk last night, and I think has been to see some people from Politik Digital. As we’ve been discussing on the mySociety email list, there’s an EU grant we’re likely to apply for in collaboration with them.
On Friday, we’re all going to the Lake district for a week, with some of the trustees and volunteers intermittently. We very conveniently and cheaply all work from home, so it’s good and necessary to meet up for a more sustained period of time at least once a year. Last year we were in Wales.