Screening for conflicts of interests in ownership data

Header image: Photo by Rob Curran on Unsplash

mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

A key corruption risk in public procurement is that officials or politicians successfully direct contracts to companies that they control or benefit from.

Understanding who the beneficial owners of these companies are is one half of preventing this; the other is knowing more about the people who shouldn’t benefit, such as politically exposed persons (PEPs) or those involved in the procurement process.

The United Nations Convention against Corruption (UNCAC) defines politically exposed people as “individuals who are, or have been, entrusted with prominent public functions and their family members and close associates”. This is a flexible definition, varying by country as to which roles should be included and how far their associations should be seen as connected. That said, typically the term will be understood as  limited to senior roles, while procurement processes might actually suffer from conflicts of interest from less senior procurement officials (PO) who are more directly involved.

Solving this problem is hard. Data sources exist to help with the issue but are not complete in themselves. A good general principle is designing a process that makes it more likely that conflicts of interest will be detected, using tools and datasets to increase scrutiny, without relying on it as an all encompassing solution.

The problem

In an ideal world, analysts would simply match a list of beneficial owners against a list of politically exposed persons. Any overlap would say if a PEP is benefiting from a government contract. Unfortunately, each step in this process is far from simple.

Previous blog posts have detailed the problems in creating a list of beneficial ownership, and politically exposed persons represent a similar challenge.  The UNCAC definition is a good definition for investigators, but to create disclosure requirements it needs to be translated into a concrete local understanding of which roles are covered.

Where there is a clear definition, or even a list of roles that it covers (as may already exist for tracking asset disclosures), this requires a system of tracking and updating changes in those roles. Up to date lists should have a mechanism for adding new PEPs with  reasonable speed after they take office, but also need to act as an archive for information about former office holders for pursuing retrospective investigations.

The more comprehensive the dataset (for instance, covering multiple countries, or sub-national significant figures), the higher the costs of maintenance and the greater the risk the list will fall out of date. Procurement officials (POs) are unlikely to be tracked by existing approaches to identifying PEPs in a country and will need new approaches. In South Africa, civil servants are prohibited by law from being a  beneficiary of the procurement process, creating a very large list of people to exclude.

The other side of the problem is that where an up-to-date and comprehensive list of excluded persons exists, you have to be able to match it against your list of owners. This runs into the problem of data matching. Name matching is error prone and while information of office holders is often public (and so a list can be maintained without special privileges), these public lists are less likely to include the unique IDs essential to easy matching of individuals.

As the Financial Action Task Force (FAFT) put it:

Inconsistent transliterations and spellings of names affect the ability of financial institutions and DNFBPs to match names in general. Scrubbing customer databases for matches against commercial databases may result in many false positives if such databases contain insufficient or inadequate identifier information. This increases the risk of missing true matches and requires additional resources to separate false positives from true matches.

However, while the problem is hard, partial and incomplete solutions have value.

PEP databases and matching tools

While FATF says that the use of databases is not sufficient to comply with their requirements, they are still a useful tool that can speed up work. Commercial databases exist, often aimed at assisting regulatory compliance in banks,  such as SmartSearch, Accuity and BAE systems Watch List Management system. There is also a variety of open data sources available, with OCCRP gathering a set of datasets on individual sanctions together as a dataset in aleph.  Some of these have approaches to name matching built in. For instance, ComplyAdvantage has a PEP database with a fuzzy matching search that can be accessed through an API.

There is a wide selection of open source tools available to help with name reconciliation, such as Elasticsearch, OpenRefine, and (a service built around a free python library). When people have entries in multiple national databases, different transliterations of their names can be recorded. OCCRP has developed a list of ‘synonames’ (soundalike) names that help address this, but reconciling individuals based on name remains a difficult problem.

These databases will not cover procurement officers, and require additional data creation and maintenance work in a country. However, as these people are state employees, there is the prospect of tying into existing HR or payroll systems to automate generating the list, and also having access to more sensitive personal identifiers such as identity or tax numbers.

Where the intention is to release the list publicly (such as Mexico’s planned SESNA datasets of public servants involved in procurement, and those who are sanctioned), the identity fragment approach could be used to aid reconciliation with other datasets without releasing this personal information.

Where unique IDs can be established for both sides of the process, this makes lookups far more efficient. Where they can’t, the process should be designed to be more likely to create false positives than negatives that can then be further investigated. This also raises the importance of how the overall system is designed. While automated screenings can be built into tools for procurers decided between contracts, enhanced scrutiny of contract winners is less time consuming than screening all those who sign up to a supplier portal.

Representing the data

While the data standard that has most use in beneficial ownership is the Beneficial Ownership Data Standard (BODS), this is not the most appropriate format for PEP data.

Currently where this data exists it is in a variety of CSV or JSON based formats. The ideal scenario is that PEP information is published in a common standard, so that multiple data sources can be easily combined in an analysis tool.

A good candidate for this is the Popolo data standard. This is a standard designed to hold information about elected politicians and legislatures, which makes it useful for holding lists of PEPs. It can store information on when particular people hold particular offices, allowing it to act as a repository of older information for comparisons several years after the fact, as well as having the ability to store multiple names and identifiers that might aid reconciliation.

mySociety’s(currently paused) EveryPolitician project uses this standard, which makes it useful as a source of global PEP information (it is used in, for instance, Global Witness’s investigation of the UK Persons of Significant Control dataset). The standard was also used by the Sinar Project’s Telus tool in Malaysia as a repository of PEP information.  FATF recommend that countries should compile a list of domestic positions/functions that are considered prominent public functions to aid determinations of whether a particular person holds a PEP-qualifying role. This could also similarly be released in Popolo format, using just the Post structure.

Alternatively, where the process is less of a lookup between two lists, and more an investigation of individuals who are beneficial owners, BODS has an optional field saying whether and if so, why someone qualifies as a politically exposed person. This could be collected as part of a verification process, with information reviewed for relevance by decision makers.

See all posts in this series.