mySociety’s Transparency team has developed a new tool, the Excel Analyser, which helps reduce the potential harms associated with accidental releases of large amounts of personal information.
The Excel Analyser scans spreadsheets before they are published on WhatDoTheyKnow, identifying metadata types that are often the cause of large data breaches, such as pivot cache data, hidden sheets, columns, rows, named ranges, and cached data from external links or data models.
If problematic metadata types and combinations are detected, the file is automatically prevented from being published on WhatDoTheyKnow.
This helps to reduce the risk that sensitive information is accidentally published online, and limits the harm that such releases can cause. The WhatDoTheyKnow team is alerted when a file has been blocked, which allows them to quickly delete any problematic material and inform the relevant authority that there has been a breach.
In cases where it’s unclear if a data breach has occurred, the authority is alerted that hidden data has been detected in their response, and given the opportunity to send a replacement file if necessary.
As well as Excel Analyser, the potentially problematic files are run through additional scripts that use Microsoft’s Presidio Analyzer tool to detect the presence of personally identifiable information within the hidden data itself. This enables the team to assess and address potential data breaches without needing to download or directly access the files themselves.
By communicating with authorities in this way, the ultimate hope is to reduce the number of data breaches involving Excel. In almost all cases, the relevant data could have been detected by authorities, and removed prior to release, using Excel’s built-in Document Inspector tool.
—
Image: Simon Lee