1. TICTeC Show & Tells: Call for Proposals now open

    We would dearly love to be issuing a Call for Proposals for our normal two day Impacts of Civic Technology Conference (TICTeC), in person, in some beautiful city somewhere, in anticipation of having some great discourse and drinks. Alas, the pandemic means we are not yet in a place to do that.

    We also recognise that attempting to hold a two day event online is just too much – we are all suffering a bit of screen fatigue at this point, and we understand how difficult it is to concentrate and engage with lengthy online sessions.

    Therefore, we will instead be hosting a series of online TICTeC ‘Show and Tells’ from March until May 2021, which will be short, energetic and to-the-point.

    These will be hour-long virtual events that will bring together the global community who use, build, research or fund digital technology that empowers citizens. Speakers will share their real and in-depth research and lessons learnt about the impacts of these digital technologies, and whether their intended outcomes were indeed realised. TICTeC, as it always has been, continues to be a safe place to honestly examine what works, what doesn’t, what can be improved etc, so, ultimately, better digital tools are developed.

    TICTeC’s ethos is that every organisation developing and running technology that serves citizens should do so with evidence-based research at the forefront of their decisions, and should examine their impacts. This is to ensure validity and legitimacy, but also to curb and mitigate possible detrimental and unintended consequences.

    Apply to present

    Each TICTeC Show and Tell will feature five 7-minute presentations, followed by a Q&A session after all speakers have made their presentations. If you have relevant research/experiences/lessons learnt to share, please submit a talk by 14 February 2021.

    We’re looking for proposals relevant to the below topic areas in particular, however, if your proposal doesn’t quite fit into these themes but is still relevant to civic technology we’d nevertheless love to hear from you. We’re also particularly keen this time to hear from users of civic technology about their experiences, as well as researchers, funders and practitioners.

    For more advice on submitting a proposal please see our guide.

    Sponsorship opportunities

    TICTeC really does bring together a truly global group of people, all passionate about examining digital technology’s impact on society. TICTeC events usually bring together participants from at least 30 countries worldwide.  But, as a charity, we need support to make TICTeC convenings happen. We’re currently looking for sponsors to help us continue. If you’re interesting in helping us to continue TICTeC’s valuable work, please see our guide to sponsorship and sponsorship packages, or feel free to contact Gemma Moulder to speak about more bespoke alternatives.

    We look forward to reading your proposals, and to seeing you at our Show and Tells!

  2. Do photos help resolution of FixMyStreet reports?

    Summary

    FixMyStreet allows people to upload images along with a report. This can quickly provide the authority with more details of the issue than might be passed along in the written description, and lead to quicker evaluation and prioritisation of the repair. For problems that are hard to locate geographically by description (or where the pin has been dropped inaccurately), images might also help council staff locate and deal with the problem correctly.

    In 2019, 35% of reports included photos. Accounting for several other possible factors,  reports with photos were around 15% more likely to be recorded as fixed than reports without a photo. In absolute terms, reports with photos were fixed at a rate two percentage points higher. This varies by category, with photos having a much stronger effect (highways enquiries and reports made in parks and open space) in some categories, and in other categories photos having a small negative effect in the resolution (reports of pavement issues and rights of way).

    In general, these results suggest that attaching photos is not only useful for authorities, but can make it more likely that reporters have their problem resolved. There is a significant reservation that photos are much more useful for some kinds of reports than others. In terms of impacts on the service, when photos can convey useful information that helps lead to a resolution, users should be encouraged to attach them. Where photos are less helpful (such as problems encountered mostly at night), other prompt suggestions or asset selection tools may help lead to more repairs.

    (more…)

  3. Publishing less: our current thinking about comparative statistics

    Over the last few years we have stopped publishing several statistics on some of our services, but haven’t really talked publicly about why. This blog post is about the problems we’ve been trying to address and why, for the moment, we think less is better.

    TheyWorkForYou numerology 

    TheyWorkForYou launched with explicit rankings of MPs but these were quickly replaced with more “fuzzy” rankings, acknowledging the limitations of the data sources available in providing a concrete evaluation of an MP. Explicit rankings on the ‘numerology’ section of an MPs profile were removed in 2006. In July 2020, we removed the section altogether.

    This section covered the number of speeches in Parliament this year, answers to written questions, attendance at votes, alongside more abstract metrics like the reading age of the MP’s speeches, and ‘three-word alliterative phrases’ which counted the number of times an MP said phrases like ‘she sells seashells’.

    This last metric intended to make a point about the limits of the data, along with a disclaimer that countable numbers reflect only part of an MP’s job.

    Our new approach is based on the idea that, while disclaimers may make us feel we have adequately reflected nuance, we don’t think they are really read by users. Instead, if we do not believe data can help make meaningful evaluations or requires large qualifications,  we should not highlight it.

    Covid-19 and limits on remote participation also mean that the significance of some participation metrics is less clear for some periods. We’re open to the idea that some data may return in the future, if a clear need arises that we think we can fill with good information. In the meantime the raw information on voting attendance is still available on Public Whip. 

    WriteToThem responsiveness statistics

    When someone sends a message to a representative through WriteToThem, we send a survey two weeks later to ask if this was their first time writing, and whether they got a response.

    This answer was used annually to generate a table ranking MPs by responsiveness. In 2017 we stopped publishing the WriteToThem stats page. The concerns that led to this were:

    • There are systemic factors that can make MPs more or less likely to respond to correspondence (eg holding ministerial office).
    • As the statistics only cover the last year, this can lead to MPs moving around the rankings significantly, calling into question the value of a placement in any particular year. Does it represent improvement/decline, or is the change random?
    • MPs receive different types of communication and may prioritise some over others (for example,  requests for intervention rather than policy lobbying). Different MPs may receive different types of messages, making comparisons difficult.
    • The bottom rankings may be reflecting factors outside MPs’control (eg a technical problem with the email address, or health problems), which can invalidate the wider value of the rankings.

    The original plan was to turn this off temporarily while we explored how the approach could be improved, but digging into the complexity has led to the issue dragging on and at this point it is best to say the rankings are unlikely to return in a similar form any time soon.

    The reasons for this come from our research on WriteToThem and the different ways we have tried to explore what these responsiveness scores mean.

    Structural factors

    There are structural factors that make direct comparisons between MPs more complicated. For instance, we found that when people write to Members of the Scottish Parliament there are different response rates for list and constituency members. What we don’t know is whether this reflects different behaviour in response to the same messages, or whether list and constituency MSPs were getting different kinds of messages, some of which are easier to respond to. Either way, this would suggest an approach where we judge these separately or need to apply a correction for this effect (and we would need to have different processes for different legislatures).

    There are also collective factors that individual representatives do contribute to. For instance, if MPs from one party are more responsive to communication, controlling for this factor to make them easier to compare to other MPs individually is unfair as it minimises the collective effort. Individuals are part of parties, but also parliamentary parties are a collection of individuals. Clear divides are difficult in terms of allocating agency.

    Gender

    One of the other findings of our paper on the Scottish Parliament was that there was an effect in the Holyrood and Westminster Parliaments where female MPs had a systematically lower responsiveness score than male MPs (roughly 7% lower in both cases, and this remains when looking at parties in isolation). Is this a genuine difference in behaviour, or does it reflect a deeper problem with the data? While responsiveness scores are not quite evaluations it seems reasonable to be cautious in user-generated data that is systematically leading to lower rankings for women, especially when the relevant literature suggests that women MPs had spent more time on constituency service when the question was studied in the 1990s.

    One concern was if abusive messages sent through the platform were leading to more emails not worth responding to. This was of special concern given online abuse against women MPs through other platforms. While WriteToThem only accounts for 1-2% of emails to MPs, it is a concern if we cannot rule out if a gendered difference in abusive messages is a contributor to a difference in a metric we would then use to make judgements about MPs.

    Our research in this area has found some interaction between the gender of the writer and recipient of a message.  We found a (small) preference for users to write to representatives who shared their gender, but without more knowledge of the content of messages we cannot really understand if the responsiveness difference results from factors that are fair or unfair to judge individual representatives on. Our policy that we should maintain the privacy of communications between people and their MP as much as possible means direct examination is not possible for research projects, and returning to publishing rankings without more work to rule this out would be problematic. We are exploring other approaches to understand more about the content of messages.

    Content and needs

    We could in principle adjust for differences that can be identified, but we also suspect there are other differences that we cannot detect and remove. For instance, constituents in different places have different types of problems, and so have different needs from their MP. If these different kinds of problems have different levels of responsiveness, what we are actually judging an MP on is their constituents, rather than their own behaviour.

    A finding from our analysis of how the index of multiple deprivation (which ranks the country on a variety of different possible measures of deprivation) relates to data in WriteToThem is that messages to MPs from more deprived areas are less likely to get a response than those from less deprived areas. The least deprived decile has a response rate about 7% higher than the average and the most deprived decile is 6% lower. However, when looking at rates per decile per individual MP there is no pattern. This suggests this is a feature of different MPs covering different areas (with different distributions of deprivation), rather than individual MPs responding differently to their own constituents.

    At the end of last year, we experimented with an approach that standardised the scores via a hypothetical average constituency. This was used by change.org as one metric among many in a People-Power index. While this approach addresses a few issues with the raw rankings, we’re not happy with it. In particular, there was an issue with an MP who was downgraded because more of their responses were in a more deprived decile, and this was averaged down by lower responses in higher deciles.

    If we were to continue with that approach, a system that punishes better responsiveness to more deprived areas is a choice that needed a strong justification. This approach is also becoming more abstract as a measure, and less easy to explain what the ranking represents. Are we aiming to provide useful comparisons by which to judge an MP, or a guide to WriteToThem users as to whether they should expect a reply? These are two different problems.

    We are continuing to collect the data, because it is an interesting dataset and we’re still thinking about what it can best be used for, but do not expect to publish rankings in their previous form again.

    FixMyStreet and WhatDoTheyKnow

    Other services are concerned with public authorities rather than individual representatives. In these cases, there is a clearer (and sometimes statutory) sense of what good performance looks like.

    Early versions of FixMyStreet displayed a “league table”, showing the number of reports sent to each UK council, along with the number that had been fixed recently. A few years ago we changed this page so that it only lists the top five most responsive councils.

    There were several reasons for this: FixMyStreet covers many different kinds of issues that take different amounts of time to address, and different councils have more of some of these issues than others. Additionally, even once a council resolves an issue, not all users come back to mark their reports as fixed.

    As a result the information we have on how quickly problems are fixed may vary for reasons out of a council’s control. And so while we show a selection of the top five “most responsive” councils on our dashboard page, as a small way of recognising the most active councils on the site, we don’t share responsiveness stats for all councils in the UK. More detail on the difference in the reported fix rate between different kinds of reports can be seen on our data explorer minisite.

    WhatDoTheyKnow similarly has some statistics summary pages for the FOI performance of public authorities. We are reviewing how we want to generate and use these stats to better reflect our goals of understanding and improving compliance with FOI legislation in the UK, and as a model for our partner FOI platforms throughout the world.

    In general, we want to be confident that any metric is measuring what we want to measure, and we are providing information to citizens that is meaningful. For the moment that means publishing slightly less. In the long run, we hope this will lead us to new and valuable ways of exploring this data.

  4. Talking TICTeC 2021

    We need your input on the future of TICTeC – read on to find out more about our plans and have your say.

    We’ve been running our Impacts of Civic Technology Conference (TICTeC) since 2015, and in that time it’s become a key annual milestone for the sector to stop, gather and take stock of how civic technology is shaping societies around the world.

    We believe more than ever in TICTeC’s core ethos: that every organisation developing and running technology that serves citizens — including ourselves — should do so with evidence-based research at the forefront of their decisions, and should examine their impacts. This is to ensure validity and legitimacy, but also to curb and mitigate possible detrimental and unintended consequences.

    Such an approach is especially important for organisations involved in democratic and civic technology, as active, informed and engaged citizens are needed now more than ever to tackle vital issues such as climate change, systemic racism, and health crises. If the tools we build to empower citizens to get things done don’t serve them or function as planned; then it’s time to do things differently.

    TICTeC allows attendees to learn from each other to do this, by sharing best practices, research, methodologies and lessons learnt – so that, ultimately, better civic and democratic tools are developed.

    We will meet again

    TICTeC truly is a global gathering, bringing together around 200 attendees from around 30 countries from across the world.

    Usually, by this time of year, we are well into the organisation of next year’s TICTeC, which we traditionally hold in March or April, in a different global city each year. And by September, we’ve usually decided where we’ll be holding the event and announced all the details including our open Call for Proposals and registration.

    However, this year, as we all know, has been like no other.

    Due to the coronavirus pandemic and the complications it brings for organising global gatherings, we have chosen not to pursue our usual plans. Therefore, for the first time since 2015 we are not planning to run an in-person TICTeC in March/April next year.

    We are instead considering our options for hosting the in-person TICTeC later in 2021, and in addition to our online TICTeC Seminar series this autumn (please do come!), we’d like to organise some further TICTeC initiatives in spring 2021.

    Help us shape TICTeC

    We’d like to make our next TICTeC initiatives as useful as possible to all those working on, using, funding or researching civic technology. What would you find helpful? What would best meet your needs and goals? More seminars? Perhaps workshops, training or networking events? Virtual or in-person? Or perhaps other initiatives that don’t involve actually convening in either of these ways, like podcasts, forums or information sharing?

    We are really keen to hear your feedback on this, as well as on the development and improvement of TICTeC in general. You can let us know your thoughts by filling out this survey or emailing us directly on tictec@mysociety.org. We’d be grateful for any feedback before 31st October 2020.

    Time to reflect

    We’re obviously disappointed to not be organising TICTeC as usual this year, as it is truly a massive highlight for us, and is one of the few gatherings of the global civic tech community left. However, we’re determined that we will meet again and we’re glad to have some time to reflect on how we do things.

    The last few months have been a good time to reflect, speak to other event hosts, attend as many virtual events as possible, review virtual platforms, update our environmental policies, and think about how we can use TICTeC to raise more underrepresented voices.

    So as well as changing the time of year we host TICTeC in 2021, we’ll also be organising things differently. We have a new Environmental Policy that will govern our decisions about future TICTeCs – e.g. hosting in cities that more attendees can reach by train/sea; carbon offsetting; opting for catering with the lowest carbon footprints; and encouraging attendees to play their own part in keeping their carbon footprints down or offsetting etc. And we’re working on plans to make TICTeC as diverse, inclusive and equitable as possible.

    We will continue to reflect and adjust, and your feedback will really help us with this, so we’re really grateful for your thoughts.

    If you’d like to hear about future TICTeC initiatives first, then do consider signing up to our mailing list or joining the TICTeC community on the Google Group.

  5. Beneficial ownership blog series

    Over the last few months, mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement.

    We’ve gathered some of the things we learned in a series of blog posts:

    The entire series can be viewed here.

    Header image: Photo by Olga O on Unsplash

  6. Beneficial ownership data and preferential procurement

    Header image: Photo by Ricardo Rocha on Unsplash

    mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

    While the main purpose of collecting beneficial ownership information is as part of an anti-corruption agenda, ownership information can also be used in public procurement as part of preferential procurement programmes. These are meant to increase the distribution of government contracts among different groups in a country. 

    South Africa is an example of a country with a system of preferential procurement through the Broad-Based Black Economic Empowerment (B-BBEE) programme. This programme gives preference to companies that (amongst other criteria) have more Black people and/or women in ownership and management.

    This works through a certification process where auditors convert evidence of ownership and management into a certification for the company, which is then used in the procurement process. While conceptually similar to beneficial ownership in many ways, this methodology differs from the requirement of disclosure of ownership that tends to be used in beneficial ownership. 

    Public disclosure of ownership could be made a component of preferential procurement or similar schemes, but this would also require understanding of ownership at lower thresholds than is currently common. Understanding the demographics of ownership requires a full picture of shareholders, and that may include adding up many with small shares. The Beneficial Ownership Data Standard (BODS), does allow for anonymous persons where a reason is given, and so information could be captured and released for demographic analysis while not disclosing the identities of owners below a threshold.

    BODS does not currently cover demographic information for individuals or certification for companies. Doing so could increase its applicability to broader procurement objectives such as B-BBEE. There is discussion on OpenOwnership’s BODS repository of what the inclusion of additional personal data fields would involve. In general BODS approaches field inclusion using the principle of data minimisation, where the data collected should be the smallest amount of personal information required to fulfil a valid purpose. There is an intentional decision to exclude gender information from the global standard/data store, with the argument that personal information included in the overall standard should be demonstrably useful for the purposes of disambiguation. This is seen as the main purpose of ownership information on a global scale, rather than demographic analysis. 

    Rather than inclusion in the global standard, localised extensions are seen as more appropriate for demographic information, as what is of interest will vary from place to place. While a gender field could be relatively universal, understandings of ethnicity are often culturally specific and a universal standard would be inappropriate. For instance, Australia’s Indigenous Procurement Policy (IPP) recommends the use of an Indigenous business register that in turn uses a ‘Proof of Aboriginality’ process that is more involved than self-certification. 

    The data standard would benefit from some abstract thinking about how country-specific demographic needs should best be reflected within BODS-formatted data. The specific questions are:

    • What should the general pattern be for extending BODS data with demographics? Remembering that demographics may be for individuals or organisations. 
    • Should self-certified data be logged differently from certified data? How should certification be acknowledged (often ‘certifying agency’ is available, but sometimes the certification certificate may have an ID number). 
    • Should there be a flag on demographic information that is stored in BODS, but shouldn’t be released publicly? Or does this logic belong outside the standard? If so, is there a generalised need for a ‘privacy schema’ and tool that can be applied to BODS to remove/anonymise particular fields?

    Demographic certification is a system of ownership collection and verification, and a general understanding of the ways in which BODS should and shouldn’t be a part of that would be useful for the future of the standard.

    See all posts in this series.

  7. Unequal impacts of open registers of ownership

    Header image: Photo by Erol Ahmed on Unsplash

    mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

    A key privacy concern with beneficial ownership, and especially open registers of beneficial ownership, is that it is making private information publicly accessible. As an Engine Room/OpenOwnership report on the subject says:

    Justifying open registers therefore depends on answering two important questions: first, why is a central register necessary, as opposed to company reporting obligations, or trusts and corporate service providers (‘TCSP’) regulation? Second, why must the central register be publicly accessible, rather than closed or limited-access?

    Common across the countries we looked at as part of this research was concern from government stakeholders and the private sector about open registers, even while there is enthusiasm for them from civil society.

    The case for open registers is, broadly, that it allows many eyes to look at the data. This creates greater oversight and scope for investigations from civil society – NGOs, journalists and members of the public, as well as feedback mechanisms to improve the quality of the data. There are multiplier effects when multiple open registers are merged that allow the same beneficiaries to be followed across borders. Making these datasets easier to access also makes it easier for official bodies to pursue investigations by increasing discoverability and removing obstacles to use.

    A key benefit of forming companies is it provides limited liability – which protects the assets of shareholders from the legal liabilities or debts of the company beyond the size of their ownership of the company. The argument justifying releasing the personal information of owners is that this is a privacy trade-off made by individuals in exchange for the substantial benefits of limited liability.

    The resulting information is a safeguard against the use of legal entities in a way that is against the public interest because it allows investigation and discovery of abuses.

    Where this becomes more complicated is that the costs of that loss of privacy are not the same for everyone. Where privacy loss leads to greater risk, this may either result in harm to individuals or the fear of that harm may mean people avoid forming companies or tendering for government contracts.  As such, the collection and distribution of data needs to acknowledge different costs of disclosing information, and allow exceptions. From the Engine Room/OpenOwnership report:

    Governments and companies should not collect and disclose data beyond the minimum that is necessary to achieve their aim, or data that poses a significant risk of harm. The risk associated with different types of information will depend on the context of both the individual and the country where they reside. This highlights the need for carefully designed exceptions regimes tailored to risks in that context.

    A key potential risk of address information being public is stalking, and this is a risk that falls more on women than men. The UK has an open register of directors and persons of significant control (PSC), and the discussion around it reflects possible risks of open registers more broadly. The comments under a Companies House blog post about GDPR features people saying they were surprised that personal information such as signatures, month and year of birth and addresses are publicly available. One commenter explicitly said the experience of being stalked made her terrified about her address information being made available. While disclosure requirements often distinguish between company registration and home addresses, micro-businesses may be more likely to be registered from home, and so have an increased privacy cost to the owner.

    In the UK, there has been an exception regime that allows information to be concealed from the public register, if personal characteristics of a person when associated with a company put a person “or any person living with them, at serious risk of violence or intimidation”.  This was amended in 2018 to remove the need for evidence for certain kinds of changes and to allow people to remove home addresses (for a cost) from register documents without the need for exceptions or evidence. Current directors have to substitute another correspondence address; former directors can have the information reduced to the first half of the postcode. This was explicitly fast-tracked without consultation as a “number of cases have been raised […] where the people involved are at risk of violence or intimidation yet cannot have their address information protected.”

    A related problem involves changes of name. A requirement that directors list former names is a common sense requirement which prevents people with bad reputations avoiding scrutiny. But for transgender directors this is a public record of their transition that may either expose them to harm, or discourage company formation in the first place. This issue is one of the reasons for the exclusion of gender from the BODS standard, as a structure where old information is superseded but not removed raises this exact issue. We also heard of a similar problem when gender is encoded into ID numbers, and these ID numbers are used in public.

    While there are situations where the risk is foreseeable and evidenced (a domestic violence victim starting a company at a new home, but needing to conceal their address), in other cases the damage may already be done when the risk becomes apparent. Even if information is successfully removed from the original source, where data has been released and incorporated into other products, retrospective redaction is more difficult.

    This problem is analogous to one faced by political candidates in the UK, where a report about intimidation and harassment of candidates and politicians led to the removal of a requirement to have home addresses printed on the ballot paper. Increased acknowledgements of the risks posed to individuals as a more diverse set of people enter into registerable roles can require re-examination of previous standards. This is especially important if it is happening alongside the opening up of information that was previously legally (but not easily) accessible.

    While privacy risks of open registers have to be accounted for in their design, closed registries might still be a privacy/security risk. One concern raised by an interviewee was that even closed registers can leak or bribery could occur for access. If a cache of data is too sensitive to publicly release, and there isn’t the capacity to properly secure it, the information may be too sensitive to gather at all. The capacity to secure and manage access to personal information is an essential component of any register.

    These problems demonstrate the importance of finding methods of delivering the public benefits of having collected private identifying information, while minimising the amount of personal information that is released. We have explored possible design patterns to help accomplish this where unique identifiers are available.

     

    See all posts in this series.

  8. Visualising conflicts of interests

    Header image: Photo by David Cook on flickr under a CC BY-NC 2.0 licence

    mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

    As part of our research into beneficial ownership in procurement, we found several potential uses of better ownership data in the procurement process:

    • The identification of bidding cartels through revealing common beneficial ownership of tenderers to procurement processes.
    • The identification of high risk or fraudulent suppliers through non-existent or suspicious beneficial owners, such as professional intermediaries, or the presence of sanctioned individuals and companies in the ownership chains.
    • There is also an appetite from both government and civil society to use beneficial ownership in the identification of conflicts of interest in conjunction with information on procurement officers and politically exposed people.

    To explore this area we built a prototype, ‘Bluetail’, to explore options for a visual interface for use by procurement officers. This demonstrates the ways in which beneficial ownership data could be used to address some of the key procurement use cases we had found as part of our research.

    Diagram showing how contract data, ownership and pep data are combined to a single datastore and interface

    Our demo sites and and source materials are available in public:

    This prototype is a demonstration of processing data in three relevant standards: BODS, OCDS, and Popolo.

    Bluetail integrates this data by identifier matching. We reviewed options for the alternative approach of attribute-based matching, and identified relevant open source tools with which to achieve this. However, the goal would be to avoid this kind of matching wherever possible as it is a time and resource intensive process, with many possible inaccuracies and difficulties in scaling. That being the case, we also explored different methods for releasing ID information that can improve the effectiveness of this process.

    More information on the process and running locally can be found in the repository readme file.

    See all posts in this series.

  9. Getting public benefit from private IDs

    Header image: Photo by Meagan Carsience on Unsplash

    mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

    Once collected, a key issue in analysis of company ownership data is correctly identifying when the same individual is connected with multiple companies. While name matching is viable in small datasets, it increases the amount of work required to remove false positives in larger datasets.

    For instance, while the UK’s Persons of Significant Control (PSC) register has a unique ID for each instance of a person having ownership, reconciling where an individual exists in multiple ownerships requires additional data processing, and possible inaccuracy. An approach developed for this dataset might not travel well to others, where address data may be less consistent (or lack an equivalent of, for example, a postcode). This problem extends beyond ownership data, and is a general issue in reconciling different datasets about people.

    The exact challenges of name reconciliations vary by the naming conventions in a country. Just as there can be no universal standard on storing name information, shortcuts to reduce ‘noise’ in a name (removing common typos, or sound-alikes) differ by language. For instance, the process to generate a CURP (ID) number in Mexico (which, by default, incorporates an individual’s first name) has explicit exceptions for very common first names, requesting use of the individual’s second name instead. Approaches within a country can also be varied: Indonesia has a wide range of ethnic and language groups, and so several different sets of common naming conventions.

    Given this problem, it is useful to be able to make use of other unique identifiers for an individual (a national ID or tax number). However, these are often seen as personal data that can not be released as part of open data. We have produced a short paper outlining the possible ways these private identifiers can be released.

    Different approaches are practical in different contexts, but at a minimum it should always be viable (and should be encouraged) to collect private identification information, and release an ID fragment to aid reconciliation. This is a short code derived from an ID, but that is not in itself unique. This can be used to more accurately group similar names into unique people. Private information can be used to add information about uniqueness to the process, without revealing the private information publicly.

    Read the paper

    See all posts in this series.

  10. Beneficial ownership tools and analysis

    Header image: Photo by Susan Holt Simpson on Unsplash

    mySociety and SpendNetwork have been working on a project for the UK Government Digital Service (GDS) Global Digital Marketplace Programme and the Prosperity Fund Global Anti-Corruption programme, led by the Foreign & Commonwealth Office (FCO), around beneficial ownership in public procurement. This is one of a series of posts about that work

    As part of this project we reviewed the open source tools that are available for working with beneficial ownership data. There is a tooling ecosystem around the Beneficial Ownership Data Standard (BODS), but it is not yet as well-developed as that around the equivalent OCDS standard for contracting information.

    There are some open source tools and analyses developed by civil society that aim to support users in understanding the relationships between companies and individuals, and related tools in the commercial sector for supporting anti-money laundering processes.

    Across all tools, Python is a reasonably well established language choice (with some civil society tools developed in Ruby) and network or graph visualisation components such as neo4j are common. We will discuss this further in the section on beneficial ownership analysis.

    OpenOwnership Register

    OpenOwnership is an organisation with the goal of making beneficial ownership data more widely available through technical development, partnerships and research. They are the key developers of the BODS data standard and host a global open registry of beneficial ownership data.

    The goal of the OpenOwnership Register is to create an “open global beneficial ownership register” that is useful across different jurisdictions and industries. This is an open source digital service which can:

    • Incorporate data from existing open registers published by countries
    • Allow cross-jurisdiction searches through a single interface/dataset
    • Becomes more useful the more open registers are published

    This works in tandem with the promotion of BODS format. Releases made in BODS are easier to incorporate into the register, and being able to make use of and contribute to a central register is an incentive to publish in a compatible format.

    The register currently contains data from every open, countrywide beneficial ownership register (UK’s Persons of Significant Control Register, Slovakia’s Public Sector Partners Register, Ukraine’s Consolidated State Registry, and the Danish Central Business Register) and the data from the EITI’s 2013-15 pilots.

    While there is additional deduplication applied to the source data (merging people with identical names, addresses and dates of birth, and companies with matching identifiers), the limitations of the source data still apply and the size of the register means that many similar entities are unreconciled.

    BODS collection and processing tools

    OpenOwnership have produced guidance on collecting BODS-compliant data using paper forms. They have also commissioned the Open Data Services (ODSC) to convert Excel format data collection spreadsheets used in the Extractive Industry Transparency Initiative (EITI) so that the data they collect will be compatible with the BODS 0.2.

    The BODS data review tool is available as an online service – as with the OCDS data review tool, it is based on the CoVE platform (Convert, Validate and Explore). Both tools check that your data complies with the relevant schema, allow you to inspect key contents of your data to check data quality, and give you access to the data in different formats (spreadsheet and JSON) to support further review. The tool is built by Open Data Services, and hosted by OpenOwnership.

    CoVE itself uses a generic flatten tool to transform standards-compliant data in JSON into spreadsheets and vice versa. This is a key piece of utility software, as it means that people working with ownership disclosure data can work in a familiar spreadsheet program. Once flattened, sheets of a spreadsheet are used to represent each of the main elements of the standard (people, entities, and control statements), as well as associated data like addresses, annotations and identifiers. This data can then be transformed into the JSON data interchange format, which has a large tooling ecosystem around it.

    The BODS mapping template enables field-level mapping between source data systems and version 0.1 of the Beneficial Ownership Data Standard. It supports the processes of:

    • identifying source systems that hold beneficial ownership information
    • itemising the fields that those systems define
    • itemising the codes and codelists associated with those fields
    • mapping the source system fields, codes and codelists to the beneficial ownership data standard

    This kind of mapping support – from simple, widely used formats and interfaces into machine readable forms, and from existing systems into data standards for interchange or publication –  are key enablers of adoption of data standards and a rich tool ecosystem.

    Beneficial ownership analysis tools

    In addition to the tools developed specifically around BODS, there is a set of open source  tools developed by civil society that analyse information on the ownership of companies, sometimes in conjunction with information about public contracting. Malaysian civic tech organisation Sinar Project have developed the Telus prototype, combining information from Malaysia about procurement, beneficial ownership, and politically exposed people. They are also working on Politikus in Kenya, which will combine those types of data with information about infrastructure projects.

    Two different civil society tools originate in Mexico: Sinapsis, produced by journalism organisation Animal Político and TowerBuilder, created by transparency and accountability NGO PODER. The goal of Sinapsis is the examination of ‘coincidences’ in a set of companies or organisations, where addresses, people, ID numbers, notaries or phone numbers may connect seemingly disconnected companies. TowerBuilder is a reusable toolkit for generating websites with data visualisations that mix open contracting and beneficial ownership data.

    These tools are generalisations of approaches originally used in one-off investigations into reusable services that can be fed new datasets. Sinapsis originated in Animal Político’s  ‘estafa maestra’ investigation, and TowerBuilder in PODER’s Torre de Control project. In the UK, the two analyses performed by Global Witness of the Persons of Significant Control register (The Companies We Keep in 2018, and Getting the UK’s House in Order in 2019) have been made available as Jupyter Notebooks – an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. This represents a space between truly one-off analyses and frameworks or services designed for reuse. The analyses are fully documented via the notebooks and are sharable and repeatable with the same data, but not generalised to other data sources.

    The OpenTender portal run in Indonesia by Indonesian Corruption Watch and the international Aleph dashboard produced by the Organised Crime and Corruption Reporting Project (OCCRP) also touch on beneficial ownership information.

    Whilst this data is not explicitly used in OpenTender.net, some of their red flag risk analyses are trying to reveal the same connections that beneficial ownership data can reveal. For example, companies being registered at the same address is suggestive that their beneficial owners may be the same, and that cartels may be in operation.

    Aleph is a document storage and search platform designed to facilitate cross-border investigation of white-collar crime. It includes some beneficial ownership datasets, and parts of the toolchain can also be used to address issues in tools more focused on beneficial ownership, such as name matching, so may be a source of useful open source components.

    A significant amount of the effort in producing these tools and analyses has been in pre-processing data to turn it into standard forms that can be easily combined and analysed. Reliably matching companies and individuals across different data sources is a recurring and significant technical problem.

    The use of BODS is not yet widespread: as civic tech early adopters, the Sinar Project uses it across their tools, but it is not used in Sinapsis, Aleph or TowerBuilder, although the latter does use OCDS. Where BODS is not in use, CSV files with various different schemas store beneficial ownership information.

    See all posts in this series.