mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work.
As part of this project we reviewed the open source tools that are available for working with beneficial ownership data. There is a tooling ecosystem around the Beneficial Ownership Data Standard (BODS), but it is not yet as well-developed as that around the equivalent OCDS standard for contracting information.
There are some open source tools and analyses developed by civil society that aim to support users in understanding the relationships between companies and individuals, and related tools in the commercial sector for supporting anti-money laundering processes.
Across all tools, Python is a reasonably well established language choice (with some civil society tools developed in Ruby) and network or graph visualisation components such as neo4j are common. We will discuss this in the section below on analysis tools.
OpenOwnership is an organisation with the goal of making beneficial ownership data more widely available through technical development, partnerships and research. They are the key developers of the BODS data standard and host a global open registry of beneficial ownership data.
The goal of the OpenOwnership Register is to create an “open global beneficial ownership register” that is useful across different jurisdictions and industries. This is an open source digital service which can:
- Incorporate data from existing open registers published by countries
- Allow cross-jurisdiction searches through a single interface/dataset
- Becomes more useful the more open registers are published
This works in tandem with the promotion of BODS format. Releases made in BODS are easier to incorporate into the register, and being able to make use of and contribute to a central register is an incentive to publish in a compatible format.
The register currently contains data from every open, countrywide beneficial ownership register (UK’s Persons of Significant Control Register, Slovakia’s Public Sector Partners Register, Ukraine’s Consolidated State Registry, and the Danish Central Business Register) and the data from the EITI’s 2013-15 pilots.
While there is additional deduplication applied to the source data (merging people with identical names, addresses and dates of birth, and companies with matching identifiers), the limitations of the source data still apply and the size of the register means that many similar entities are unreconciled.
BODS collection and processing tools
OpenOwnership have produced guidance on collecting BODS-compliant data using paper forms. They have also commissioned the Open Data Services (ODSC) to convert Excel format data collection spreadsheets used in the Extractive Industry Transparency Initiative (EITI) so that the data they collect will be compatible with the BODS 0.2.
The BODS data review tool is available as an online service – as with the OCDS data review tool, it is based on the CoVE platform (Convert, Validate and Explore). Both tools check that your data complies with the relevant schema, allow you to inspect key contents of your data to check data quality, and give you access to the data in different formats (spreadsheet and JSON) to support further review. The tool is built by Open Data Services, and hosted by OpenOwnership.
CoVE itself uses a generic flatten tool to transform standards-compliant data in JSON into spreadsheets and vice versa. This is a key piece of utility software, as it means that people working with ownership disclosure data can work in a familiar spreadsheet program. Once flattened, sheets of a spreadsheet are used to represent each of the main elements of the standard (people, entities, and control statements), as well as associated data like addresses, annotations and identifiers. This data can then be transformed into the JSON data interchange format, which has a large tooling ecosystem around it.
The BODS mapping template enables field-level mapping between source data systems and version 0.1 of the Beneficial Ownership Data Standard. It supports the processes of:
- identifying source systems that hold beneficial ownership information
- itemising the fields that those systems define
- itemising the codes and codelists associated with those fields
- mapping the source system fields, codes and codelists to the beneficial ownership data standard
This kind of mapping support – from simple, widely used formats and interfaces into machine readable forms, and from existing systems into data standards for interchange or publication – are key enablers of adoption of data standards and a rich tool ecosystem.
Beneficial ownership analysis tools
In addition to the tools developed specifically around BODS, there is a set of open source tools developed by civil society that analyse information on the ownership of companies, sometimes in conjunction with information about public contracting. Malaysian civic tech organisation Sinar Project have developed the Telus prototype, combining information from Malaysia about procurement, beneficial ownership, and politically exposed people. They are also working on Politikus in Kenya, which will combine those types of data with information about infrastructure projects.
Two different civil society tools originate in Mexico: Sinapsis, produced by journalism organisation Animal Político and TowerBuilder, created by transparency and accountability NGO PODER. The goal of Sinapsis is the examination of ‘coincidences’ in a set of companies or organisations, where addresses, people, ID numbers, notaries or phone numbers may connect seemingly disconnected companies. TowerBuilder is a reusable toolkit for generating websites with data visualisations that mix open contracting and beneficial ownership data.
These tools are generalisations of approaches originally used in one-off investigations into reusable services that can be fed new datasets. Sinapsis originated in Animal Político’s ‘estafa maestra’ investigation, and TowerBuilder in PODER’s Torre de Control project. In the UK, the two analyses performed by Global Witness of the Persons of Significant Control register (The Companies We Keep in 2018, and Getting the UK’s House in Order in 2019) have been made available as Jupyter Notebooks – an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. This represents a space between truly one-off analyses and frameworks or services designed for reuse. The analyses are fully documented via the notebooks and are sharable and repeatable with the same data, but not generalised to other data sources.
The OpenTender portal run in Indonesia by Indonesian Corruption Watch and the international Aleph dashboard produced by the Organised Crime and Corruption Reporting Project (OCCRP) also touch on beneficial ownership information.
Whilst this data is not explicitly used in OpenTender.net, some of their red flag risk analyses are trying to reveal the same connections that beneficial ownership data can reveal. For example, companies being registered at the same address is suggestive that their beneficial owners may be the same, and that cartels may be in operation.
Aleph is a document storage and search platform designed to facilitate cross-border investigation of white-collar crime. It includes some beneficial ownership datasets, and parts of the toolchain can also be used to address issues in tools more focused on beneficial ownership, such as name matching, so may be a source of useful open source components.
A significant amount of the effort in producing these tools and analyses has been in pre-processing data to turn it into standard forms that can be easily combined and analysed. Reliably matching companies and individuals across different data sources is a recurring and significant technical problem.
The use of BODS is not yet widespread: as civic tech early adopters, the Sinar Project uses it across their tools, but it is not used in Sinapsis, Aleph or TowerBuilder, although the latter does use OCDS. Where BODS is not in use, CSV files with various different schemas store beneficial ownership information.
Research Mailing List
Sign up to our mailing list to hear about future research.