1. Responding to AI-driven demand on public systems

    LLMs can increase demand on public systems by removing the friction that previously limited access. One potential result of this is new forms of unconsidered rationing that recreate that friction. Instead, we should move away from zero sum systems and aim for technical and policy approaches that turn unscalable private benefits into efficient collective ones. 

    Many kinds of citizen-driven interactions with the public sector are rationed through friction: fewer people engage in them than might do otherwise, because they feel the process is time-consuming or requires expertise. We can see examples of this in planning objections, correspondence with elected representatives, FOI requests and consultation responses. LLM technologies can lower the time or expertise required and also prompt people to engage in the processes in the first place: “Would you like me to draft a complaint about this?”

    Systematic impacts

    This reduced friction may be good for individuals, but the resulting increase in engagement can overwhelm the system itself. In response, it may slow down, collapse, or adopt new means of rationing or prioritising access. We can see indications of this across different kinds of interactions: journalist Martin Rosenbaum has identified an upward trend across public sector complaints organisations, and concerns are being raised across sectors about AI’s contribution to growth in volumes. 

    So how should organisations that handle public submissions respond in an informed way? Here’s an approach to thinking about the problem. We can divide these interactions into three types:

    • Private benefit – when an interaction has a benefit almost exclusively to the requester, either competitively (eg a grant or job application), or non-competitively (eg an application for a state benefit).
    • Collective benefit – when an interaction has a benefit to the requester, and also to wider society (eg a public FOI request, reporting a pothole). 
    • Zero sum interaction – when an interaction success for one person is a failure for another (eg planning). 

    Private benefits

    Some public services fall clearly in the first category: they are unavoidably a collection of private interactions. For these, there might be improved efficiencies to be found in delivery at scale but, particularly for non-competitive benefits, these are also likely to eventually run into decisions either about increasing provision (assuming a higher level of claims from those entitled going forward), or new forms of rationing.

    As stands, AI inputs can both improve the efficiency of systems through sharper, more complete initial submissions, but can also make more verbose and complex submissions that cite non-existent law. To prioritise the former over the latter, systems can explore triage approaches that enforce or encourage the qualities that make input valuable: clarity, accuracy and concision. 

    When running into real limits, it is important to be clear about the criteria you want to ration on, and that they are in line with the overall purpose of the system, rather than implicitly prioritising those with greater resources.  In their FOI complaints system, the ICO is using public benefit as a criteria for prioritisation. The British Academy uses partial randomisation above a scoring cutoff to ration randomly rather than requiring additional work (on both sides) to further differentiate.

    Collective benefits

    A bigger win is, where possible, to transform private benefits into collective benefits.  In these cases, reduced friction is self-regulating because spillover benefits from an individual’s case help reduce demand from others : the private benefit person B is looking for has already been provided by person A’s interaction. 

    One of the key ways mySociety’s services help people is to harness the self-interest of individual users for collective benefits. Every public request made on WhatDoTheyKnow also adds to the pool of public knowledge accessible on the internet, reducing the need for duplicate requests (with a similar logic to reducing duplicate reports on FixMyStreet). This means we can effectively lower the bar to access while improving overall efficiency of the system. 

    We come to this from a technology lens, but the same principles apply from an institutional-design approach. For instance, if MPs’ casework or complaints are  increasing, you want to shift towards more systematic rather than individual benefits from casework. This looks like support for better collective learning, and an improved ombudsman to support collective rather than individual fixes. This kind of approach works best where good statistics are collected at a system level to help identify what collective changes are needed: tracking the overall level of demand, level of demand to different parts of the system and nature of the demand, ie what are people asking for.

    Zero sum systems

    The biggest shift needed is in reforming zero-sum systems, where there is currently an incentive for both sides to escalate the volume. Reduced friction here just raises costs for all concerned rather than giving increased benefits to anyone. Individual use of AI to create submissions is individually enabling in these cases, but not collectively. So, in the words of the 1980s classic film War Games, “the only winning move is not to play”. The real innovation is in solutions that open up new, and more effective, ways of working out what everyone can live with, rather than recreating rationing through new means. For instance, rather than adversarial AI planning objection generators, we could aim for a collaborative planning system that through improved communication and coordination lowers costs and removes incentives to volumes of engagement. 

    Red flags for zero sum interactions are when volume is implicitly being used as a proxy for strength of feeling, or popularity of a particular viewpoint, because its value as a signal is going to become increasingly degraded as AI use increases.   

    Systems work better when the benefits are collective rather than atomised

    Mass adoption of AI removes one set of bottlenecks, but this can create capacity challenges for public systems. Previous waves of civic technology have built on reduced costs of storing and sharing information to build systems that help share the benefits of people’s work and lower the barriers to entry.

    The current wave of AI chatbots cut against this, encouraging atomised approaches, rather than collective ones. We need to explore technical and policy approaches that help systems better achieve their purpose, without giving up on the idea of lowering barriers to entry. We can do this both by exploring how the technological features of AI tools can be bent towards collective gains, and moving away from systems that incentivise these approaches. 

    Image: Engin Akyurt

  2. How can we tell when AI is actually the right tool for the job?

    Generative AI is good at solving some kinds of problems, and bad at solving others. With the rush to apply AI approaches across the public and private sector, we want to encourage people to use the right tool for the right problem. This blog post proposes a test that makes it easy to understand whether or not the applications are genuinely beneficial for the job in hand.

    Generative AI has no concept of truth. It is designed to create outputs that are internally consistent, and this might or might not coincide with true things when the training data and context are well aligned. By now, we’ve all heard examples of false-positive hallucinations, where AI has asserted that something exists or was said because doing so is internally consistent with the question — but which turns out not to be true. Depending on the application, if unchecked, this can have catastrophic effects, meaning that validation of outputs is essential.

    How to assess your project for AI suitability

    In our recent Shifting Landscapes report, we shared a simple matrix that helps to assess how useful it is to apply an AI approach to any given problem.

    It asks how hard/expensive is it currently to produce a solution without AI, and how hard/expensive is to verify that the solution is correct, with four potential outcomes:

    Producing a solution is cheap/easy Producing a solution is hard/expensive
    Verifying the solution is cheap/easy Weak AI benefits (which may increase at scale) Significant AI benefits
    Verifying the solution is hard/expensive Get a human to do it Break down the verification problem (and repeat)

    Let’s look at each possible outcome in turn:

    1. Weak AI benefits (which may increase at scale)
    producing a solution is cheap / verifying the solution is cheap

    This applies to tasks where AI tools might help people complete tasks more efficiently, but where the resulting impact or time savings are not significant. Over time/mass use, the benefits might increase.

    Examples here include tasks like letter-writing and making summaries of documents or transcripts. If AI can do the initial grunt work, a human can take over and make tweaks to the output, nominally saving some time.

    In our own field of civic tech, we can see this kind of tool being used to help people navigate bureaucracy: it might help format letters to representatives, or make effective appeals when FOI requests are refused.

    Cheap processes at scale can also unlock new collective benefits. For instance, Muckrock uses LLMs to extract information and success/fail status from individual FOI responses. Doing this manually per request is easy for people, but requires lots of people to do the work to create a useful dataset across the entire corpus. An AI approach drops the costs further, which produces a small benefit on an individual scale, but collectively creates useful data.

    As we note in our AI Framework, we have to recognise that a large number of small uses can build up into a negative effect. For instance, AI-created objections to planning applications might overwhelm a system that was built for a world in which there are higher hurdles to lodging an objection.

    2. Significant AI benefits
    producing a solution is expensive / verifying the solution is cheap

    In this scenario, we’re thinking of situations where it is harder for a human to create a credible solution than it is to check if the outputs are valid. Conceiving a solution might be hard because it requires specialised knowledge, such as coding, or significant time and resources, like the analysis of a huge dataset; but it would be easy for a human to see whether or not the solution is working as intended.

    One of the biggest practical uses of AI so far has been seen in coding, because coding problems fit so well into this category, and so provide potential benefits. The structure of computer code is often formally checkable (for at least syntax errors), and often there is a relatively short turnaround between “having code” and “checking the code is effective”. This isn’t to say that all coding fits in this box, but enough that a clearly productive set of tools exists.

    There are strong potential benefits here because an expensive process can be made cheaper, while the quality of the output can be checked through relatively cheap verification methods.

    This segment of applications can be impactful even where access to models is relatively expensive, as a relatively small number of LLM users can have a big impact through the products that emerge.

    3. Get a human to do it
    producing a solution is cheap / verifying the solution is expensive

    Some LLM processes produce outputs that cannot be quickly verified by automatic or human means.

    Here, using an LLM for the initial solution might be less effective than having a human do it from the start.  While tweaking an email that contains slightly poor wording is a cheap correction, adjusting a multi-page report written by an LLM (involving fact checking, correction, restructure, etc) might be more complicated than just having someone write the original work.

    When humans approach a piece of work like this, the production and verification processes pretty much happen at the same time, because the skills required to produce the work are the same ones that suggest the work is valid.

    “Use a human” is often most clearly the sensible approach for projects that need a high level of accuracy and confidence in the material produced. For example, we talked to OpenFun about their LawTrace site, which brings together legislative information in Taiwan. They made a point of choosing not to use AI at all in this project. Having accurate information was far more important to users than any convenience AI could introduce.

    4. Break down the verification problem
    producing a solution is expensive / verifying the solution is expensive

    Sometimes solutions are expensive for a combination of reasons, and this can justify investment in trying to split the verification problem into smaller problems.

    Through a sequence of different checks on LLM output, we can move problems towards being strong uses of AI, because it dramatically reduces the time needed to produce the solution, while the verification costs are manageable.

    As an example, our APPG scraper sits in this category. We wanted to get accurate lists of parliamentary group memberships from dozens of different websites. Our original idea was that we would need to use a crowdsourcing approach, because we thought an LLM would be vulnerable to inventing lists of MPs.

    But after some consideration, we invested time in a step where we could verify with code whether or the names extracted were actually listed on the relevant sites. We can see a similar example in the public consensus platform Pol.is – where category descriptions are linked back to concrete sources to facilitate easier double checking.

    Similarly, you might find that aspects of your problem (if not the whole problem) are appropriate for mechanical checking. Could LLM code make a custom verification process easier? Can a series of automatic/human checks be made more efficient with a clear verification workflow? Each individual improvement moves your project closer to being a potentially strong use of AI.

    Investment in the verification process might move the problem closer to having weak/strong AI benefits, where outputs can be derisked through cheap quality checks — but you’ll only know through systematically breaking it down in this way.

    We hope that, by sharing this matrix, we will encourage more thoughtful deployments of AI technology in governments  and beyond. Please feel free to share it with those who will find it useful.

    This blog post has been adapted from our report Shifting Landscapes – A practical guide to pro-democratic tech.

    Image: Leo Lau & Digit (CC-BY 4.0)

  3. Reducing FOI cost limits will reduce government transparency

    Key points

    • Lowering the cost limit time would reduce the scope of the Freedom of Information Act, giving government departments greater leeway to deny requests.
    • This  will have a disproportionate effect on high-impact Freedom of Information requests made by journalists and researchers. 
    • It represents a new restriction on public scrutiny of government, counter to promises around improved government transparency, such as the promised roll-out of FOI to contractors providing government services
    • It is unlikely to significantly reduce the volume of work required to process requests – local governments also receive a comparable volume of requests at a lower cost limit, and there are administrative costs even if a request is rejected under a new, lower cost limit. 
    • Transparency is not a nice extra to have that can be cut when the budgets are tight. Governments that think they cannot afford transparency will be surprised at the corruption and inefficiency they will need to afford in its absence.
    • The actual solution to volume is improved government processes. Reducing the cost limit might increase admin burden on authorities (due to increased back and forth with requesters) whereas better proactive publication genuinely could reduce volume of requests by removing the need to request in the first place.

    What’s being proposed?

    A policy is being floated, around decreasing the FOI cost limit in order to address an increase in the volume of requests. 

    Financial Times: UK considers FOI clampdown as requests soar:

    British officials are considering a clampdown on the freedom of information system in a move that would spark backlash from transparency campaigners.

    Government figures are discussing a reduction in the cost ceiling for processing a request as the number of annual submissions has spiralled, according to people familiar with the situation.

    The soaring number of requests comes against a backdrop of heavily constrained Whitehall budgets, they added.

    There are no further details beyond this briefing. Our assumption is that the proposal is for a reduction to the central government cost limit (see below), but with no details on the scale implied. 

    As reflected in the FT story, because of central government statistics, we can see that this increase mostly relates to defence records being moved to the National Archives. It is also worth putting in the context of a separate attempt to justify restrictions based on national security

     

    What is the cost limit?

    The “appropriate limit” is the time allowed to deal with an FOI request. 

    At the start of the FOI process, a cost is estimated for the likely time it will take to locate, retrieve and provide the requested information (but not time taken in doing public benefits tests or applying redactions). 

    It has a value in cash, but this is pegged against a set cost per hour (£25 an hour in UK FOI, £15 in Scottish FOI).  So effectively this is a time allowed in hours:

    • £600 (40 hours) – Scottish FOI
    • £850 (34 hours) – Parliamentary questions
    • £600 (24 hours) – Central government FOI
    • £450 (18 hours) – Other public bodies FOI

    A related part of the rules is that authorities can aggregate similar requests (for similar information by connected people and made within 60 working days) and apply the cost limit to them collectively. Authorities may interpret this quite broadly if the requests share an overarching theme or are handled by the same team. 

    Another relevant system is parliamentary questions, where the search time is pegged to 140% the cost limit for central government. The resulting ceiling is £850 (34 hours). 

    How are the cost limits changed?

    The cost limits for UK FOI are set by The Freedom of Information and Data Protection (Appropriate Limit and Fees) Regulations 2004

    A new set of regulations could be made without a vote in Parliament.  The cost limits are changed via a statutory instrument passed by the negative procedure. This means the government lays the change before Parliament, and it automatically becomes law without a vote. 

    MPs can sign a petition to call for a vote to annul it, but there is no automatic threshold where a certain number of signatures requires a vote. Generally it requires support of the official opposition to get a debate. 

    What would be the effect of reducing the cost limit?

    The likely effect of reducing the cost limit would be to prevent a class of currently useful and productive FOI requests, without significantly reducing volume or administrative costs. 

    Who would this affect the most?

    As the existing cost limit already rules out very broad requests, the change in any reduction would fall mostly on the most complex requests allowed by the current rules – and as such is likely to disproportionately affect journalistic and researcher use of FOI. Exploratory requests would need to be framed more narrowly, and a lower limit combined with the aggregation rule would make it easier for authorities to chain related requests together and deny them.

    Any reduction in the central government cost limit would also have a knock-on effect on parliamentary questions, as the search time is linked. 

    Would it reduce administrative costs?

    This change would have a mixed effect on administrative costs: marking a bigger set of FOI requests as invalid has costs of its own. 

    Reducing the cost limit would give more leeway to authorities to refuse requests when the documents requested are difficult to provide, but would be targeting a narrow band between what was previously acceptable and the new limit.  A lower threshold invites more dispute about the threshold, and requires justification for it falling in a narrow range, potentially causing more back and forth with requesters. What should happen in these cases is that authorities give advice and assistance on reducing the scope of the request to help fit inside the cost limit. Failing to do this has been noted in ICO decision notices about whether the exemption was applied correctly. As such, administrative savings are likely to be disappointing, as a lower cost limit creates work of its own. 

    The natural experiment of the two different cost limits also does not suggest reducing would have a large effect on volume. The lack of comprehensive FOI stats means we do not have an up-to-date figure, but in 2017, local and central governments had comparable volumes of average FOI requests – despite the difference in the cost limit. 

    What is a better approach to FOI volume?

    Increased FOI volume raises the importance of efficient discovery and publication of information. Rather than reducing public transparency, public authorities should invest in their own processes and data to better meet internal and external needs. 

    Public authorities need to be good at managing information — not just to answer FOI requests, but in order to work effectively. The effect of improved technology should be to make it easier for authorities to understand the information they hold, both for their own purposes and for public transparency.

    More value can be realised by each FOI request released through improved disclosure logs. WhatDoTheyKnow.com removes the need for future FOI requests by making previous requests easier to find, with far more users of the site viewing information that has been published in previous FOI responses rather than making new requests. Public authorities can help reduce duplicate requests by publishing disclosure logs that make information released available to search engines (including AI agents), delivering more impact to releases and reducing repeated costs. This also helps address the social cost of atomised AI approaches: information is released for public benefit. 

    Building on this, authorities can also learn from the subjects about  which  FOI requests are frequently made, and use that to inform their proactive publication of information. Increased volume of requests represents people making use of their information rights: this should be encouraged, while trying to make the process of finding and publishing information as efficient as possible. 

    Transparency isn’t a cost: it’s a necessary investment for the rewards of reduced risk of corruption, and improved quality of work through the deterrent effect of future transparency. Efforts to cut costs could instead focus on the cost of secrecy— the high legal fees government departments have paid to try and keep secret information in the public interest. Government and parliamentarians should be invested in making this system work well, for the public benefit, rather than restricting access. 

    Read other responses

    Header image: Photo by Jr Korpa on Unsplash

  4. New report: Shifting landscapes

    Today we’re launching a new report: Shifting Landscapes: A practical guide to pro democratic tech.

    This report builds on the conferences, seminars and conversations we’ve been having in our TICTeC programme over the last few years, to present a comprehensive picture of where pro democratic tech is now. We explore how technology can strengthen and defend democratic life, and how civic tech practitioners, pro-democracy organisations, and funders can make effective choices in a rapidly shifting landscape for both democracy and technology.

    The result is a report of eight chapters in four thematic areas:

    Pro-democracy tech: this extends our definition of pro-democracy tech, to explore how  technology can be joined to wider democratic movements working to both defend and extend democracy using technology.

    Communities of practice: what we’ve learned about how we can best work together with our communities of practice around Access to Information and democratic transparency, balancing efficiencies of scale with unique circumstances and needs.

    Shaping the landscape: Civic tech sometimes needs to adapt to changing times, but should be trying to shape the times. These chapters look at changing distribution methods (video and AI chatbots), but also how we can create infrastructure that makes democratic projects easier and more effective.

    Using technology effectively: These chapters are aimed at practitioners thinking about how to use technology, with examples and frameworks for practical approaches to AI technologies, but also other examples of tools that can be effective in ways that AI approaches can’t

    The report can be read online, or as a PDF

    Header image: photo by Kalen Emsley on Unsplash

  5. Leaky Pipes: What’s wrong with donations data

    As part of our WhoFundsThem work we want to make better information available about money in politics. 

    Last year we released a report Beyond Transparency – looking at the UK Parliament’s register of financial interests, and wider arguments about how we fund politics. 

    Today we’re releasing a follow-up report: Leaky Pipes (read online or download as a PDF). This covers what we’ve learned (and what we think could be better) about the systems for reporting election donations. You can also re-watch the launch event on YouTube

    This report started because we were a bit confused about the different ways data could be declared and reported.  And to be honest, we’re still a bit confused – but we have more diagrams to explain why. 

    What we explore in this report are the multiple routes for declarations, different thresholds for disclosure, and uneven public access. This makes cross-checking difficult and leaves gaps where information can vanish depending on how a donation flows (direct to candidate vs via party), how large it is, and whether the candidate wins.

    The result is that candidates and agents face complex reporting requirements, electoral administrators hold paper-heavy returns that are hard to inspect, and the public (and sometimes regulators) struggle to build a consistent picture of who is funding whom.

    From this, we’ve made recommendations on making reporting easier to do correctly, faster to publish, and simpler to scrutinise:

    • Move to a “report once” process that informs multiple systems
    • Harmonise public disclosure at £1,000
    • Create a comprehensive public database above that threshold
    • Create a safe private database below the threshold for research and evaluation purposes

    Building on this, we suggest three practical avenues for follow-up work that would strengthen the case for reform and help design better systems:

    • User research and prototyping to map how a “report once” service would work for candidates, agents, administrators, Parliament, and the Electoral Commission. 
    • Sampling local authority returns to demonstrate the scale and type of inconsistencies between routes.
    • Exploring a data-sharing agreement for controlled research access to the Electoral Commission’s small-donor/return data.

    The report can be read online or downloaded as a PDF.

    Header image: Photo by Meg on Unsplash

  6. New research report: Supporting good communication

    With WriteToThem.com we want to run a service that helps people write the right message to the right place. That means helping users express themselves effectively and keeping the service a constructive channel between constituents and representatives by deterring abusive messages.

    Abuse and intimidation aimed at elected representatives does not just harm the person receiving it. It corrodes the openness and trust that democratic culture needs, and it can deter people (especially those from under-represented groups) from taking part in public life at all. 

    We think we’re in a good position to play a constructive role in this area. One problem that has been raised is frustration at bouncing around layers of government, where a key benefit of WriteToThem is getting people to the right layer first. But we need to go further than that to understand how we can discourage abusive messages – both to directly implement approaches, and to trial patterns that could be implemented by a wider range of parliaments and local authorities.

    We’ve been exploring what a “toxicity” risk score would look like in our infrastructure and have released a report of our findings so far. We trialled a range of options — from baseline keyword matching, to Google’s Perspective API, to running lightweight models locally (IBM Granite Guardian), and then to LLM-based grading as a second pass for tricky cases like implicit threats or messages quoting abuse from third parties.

    But having a risk score is less important than how it is used. We’ve mapped out a few different approaches beyond a manual moderation approach – such as soft “nudge” prompts (encouraging people to reconsider wording before sending), cool-down delays for higher-risk messages (without removing someone’s ability to contact their representative), and informative flags for recipients (for example, passing along a risk score or relevant metadata on a message).

    Our next step has mapped out some technical possibilities to talk to more people about which approaches make sense  – which we’ll be doing as part of our wider Welsh Government funded democratic engagement work to improve WriteToThem.

    For more details on the approaches tested, potential issues with different methods of implementation, and unanswered questions, you can read the report online.

    Image: Pawel Czerwinski

  7. New report: WriteToThem Insights

    Understanding more about constituent communication

    We’ve released a new report exploring insights from WriteToThem about the content of constituent communication – you can read the whole report online or a summary below. 

    WriteToThem.com is a long-running mySociety service that enables people across the UK to contact their elected representatives by entering their postcode and sending a message through the site.

    This service provides a unique opportunity to understand the flow of communication between many constituents and many representatives. Our WriteToThem Insights report uses surveys to understand more about what people are writing about. 

     While previous work identified patterns in response rates and deprivation gradients, this experiment focuses on understanding what people are writing about, distinguishing between casework (individual problem-solving) and campaigning (policy-oriented advocacy).

    A new survey and data-processing pipeline were developed to categorise and anonymise message summaries, applying machine learning and large language model techniques to cluster and label topics. Analysis of 5,400 messages from Q3 2025 found:

    • Casework and campaigning form two distinct types of communication, with casework more common for councillors and campaigning dominant for MPs.
    • The deprivation gradients of these two types differ sharply: campaigning is concentrated in less deprived areas, while casework is more evenly distributed, though likely still underrepresents the most deprived groups.
    • First-time users are more likely to send casework messages and to receive responses.
    • Top themes in casework include housing, local services, health, and anti-social behaviour; in campaigning, issues such as Gaza, climate policy, and digital ID predominate.

    This data has limits. This covers only a portion of total correspondence, and with little information about whether the sample is representative enough to generalise to messages sent in general. That said, we think there are strong uses both for improving WriteToThem itself and for informing broader understanding of constituent communication.

    We want to build on this work: refining the analysis process and exploring opportunities to collaborate. We see particular value in digging more into casework data as something that could inform more systematic approaches in this area, helping representatives across the country join up information and improve collective scrutiny of government services.

    The full report can be read here.

    Image: Christopher Burns

  8. Mayoral scrutiny: building an ecosystem of accountability

    Mayors and combined authorities are the future of devolution in England,  but the ways in which citizens can understand, scrutinise, or influence them remain unclear.

    Our latest report, Mayoral scrutiny: supporting an ecosystem of accountability organisations, argues that devolution will not deliver on its promises unless we also invest in new forms of civic and democratic oversight. It is not enough to create powerful new Mayors; we need to create the ecosystem that holds them (and the wider web of regional institutions) to account.

    Why scrutiny matters

    Combined authorities are designed to bring councils together to plan and deliver across a region. But unlike the London model, they do not have an elected assembly meant to hold the mayoral executive to account.

    Existing models, such as council scrutiny committees or parliamentary hearings, can only go so far. Combined authorities need scrutiny that reflects the full complexity of their networks and partnerships.

    A scrutiny and civic development fund

    We highlight two complementary approaches already being explored:

    • Local Public Accounts Committees (LPACs): technocratic bodies that examine how public services work together across a region, looking not only at the Mayor’s decisions but at value for money and collaboration across agencies.
    • Democratic journalism funds: public-interest media funds guided by citizens’ assemblies, ensuring independent, locally relevant journalism that supports democratic life.

    We propose bringing these ideas together in a new Scrutiny and civic development fund: a local grantmaking body with priorities set by a citizens’ assembly. The fund would support a mix of civic institutions — from expert-led scrutiny committees to independent journalism — that together strengthen public accountability and regional identity. Approaches along these lines would help ensure that devolution does not just move power geographically, but makes it genuinely more responsive to the people it serves.

    Supporting existing scrutiny

    This report also explores ways we could apply our existing tools and approaches to sustain and connect the accountability ecosystem that already exists. Through tools like MapIt, TheyWorkForYou, and WhatDoTheyKnow, we can build a civic democratic stack to support journalists and civic technologists to understand and monitor combined authorities.

    We’ll also continue to explore how civic tech can make these new layers of governance more transparent, and how data and digital infrastructure can support the work of local scrutiny.

    Read the full report

    The report explores the history of scrutiny in English devolution, how these proposals could work in practice, and sets out the steps to strengthen the civic fabric around mayors and combined authorities. You can read it here. 

    Header image: Photo by Omar Flores on Unsplash

  9. Running open LLM models

    Most discussion and usage of LLMs is focused on high profile closed models such as OpenAI’s ChatGPT family, and Google’s Gemini – which are widely available and integrated into a range of existing products and services. 

    Because these are closed models, access and hosting of the models is controlled by the companies that create them. This presents a dilemma for civic tech organisations who believe in open source – where important parts of their processes can disappear into black boxes beyond your control. These may work well/be affordable today, but creates new risks. Specific models might become unavailable, there might be changes in pricing, and this represents lock-in to specific providers. 

    Open LLM models provide an alternative approach. In a familiar issue from open source licensing,  there are different ways in which a model can be ‘open’. Open weights models have the final structure of the model released and can be run on your own hardware (Meta’s Llama model is an example of this). Fully open models have the underlying (open licenced) training data released, as well as the recipes and evaluation systems used in their training. AI2’s OLMo family of models and the recent Swiss AI institute’s Apertus model are examples of these. Somewhere in between these are approaches like IBM’s Granite models, where the model is released as open weights and the data was licensed to be able to train on (addressing copyright issues) but is not publicly accessible. 

    What are weights? Basically a model can be understood as a big network of connections – where the ‘weights’ are how strong (and influential) a connection is. What’s happening in the training process is a refinement of these weights as a result of being exposed to the training data. The weights at the end of the process are the trained model, and can be shared and used by others. But if you also have the training data and process, you can recreate the model step-by-step, with a clear audit trail of what’s in it.

    Any kind of open weight model is practically appealing because they unlock new ways to work with private data without sharing with third parties, and create more flexibility around infrastructure. For instance, we currently use a fine-tuned version of Llama to help flag immigration correspondence in WhatDoTheyKnow.

    Fully open models are ethically appealing because they avoid the issues of models that have been trained on copyrighted data. Their existence is a challenge to an AI policy debate where countries must trade-off the rights of creators against the benefits of AI as sold by a handful of companies.  They fit well with our open source ethos – and understanding more about how to use them practically helps give us options to improve our own services, and contribute to wider arguments about responsible use of AI.

    This blog post is a write-up of several practical experiments in using the 7b parameters variation of OLMo-2 both locally on a laptop GPU and remotely using HuggingFace’s inference endpoints. 

    Using OLMo-2 locally

    Our purpose in running something locally is to be able to process sensitive information that should not leave our infrastructure. In this case, using OLMo-2 to create human-readable representations of clusters from WriteToThem survey responses. While users are asked not to include personal information in this survey, enough do that we need to treat the basic dataset as having personal information that should not be shared.

    We used llama-cpp (and the associated python bindings) to run the local model. An alternative local approach is to use ollama to run a local server. The reason for using llama-cpp in this case is that ollama doesn’t always seem to pick up that less well known models can use ‘tools’ correctly (which is required for structured data output). Another benefit is having it run in process rather than as a separate server is the script can turn on and off the resource intensive bit (although there’s a corresponding start up time) rather than needing a separate server process to run.

    Setting up the libraries

    Installing llama-cpp in a way that can use the GPU is not straightforward. This set of instructions for Windows 11/Nvidia GPU mostly worked for me. I additionally needed to add an extra DLL directory before importing from llama_cpp because there’s a DLL folder that the library wasn’t yet referencing. 

    Big picture, WheelNext is a project to try and make installing correct versions of the library easier across different OS/GPU combinations. In the meantime, setting up a local machine is a bit fiddly.

    Downloading model information

    Llama-cpp uses GGFU files – which have all the weights in a single file. There are libraries to convert from the transformers format – but this is often made available by model publishers on HuggingFace.

    Downloading the model can be done using the huggingface_hub command line too (here using uv). 

    uvx –from huggingface-hub hf download allenai/OLMo-2-1124-7B-Instruct-GGFU olmo-2-1124-7B-instruct-Q4_0.gguf –local-dir models

    This is pulling down a quantised version – which has the same number of parameters – but the values of the weights have been significantly rounded down. This tends to have much less decrease in quality than the corresponding decrease in file/memory size (why? Broadly high fidelity here is useful for adjusting in training which will happen in small shifts, but when you have something working the general structure is good enough)  – and this fits it just inside the ability of my laptop’s GPU. 

    This download can also just be done in code:

    from llama_cpp import Llama

    from functools import lru_cache

    @lru_cache

    def get_llm():

    return Llama.from_pretrained(

        repo_id=“allenai/OLMo-2-1124-7B-Instruct-GGUF”,

        filename=“olmo-2-1124-7B-instruct-Q4_0.gguf”,

    )

     

    Structured data output

    To get structured data out of the model, Pydantic AI can be used with Outline to query the llama cpp model.

    This:

    • makes it easier to define Pydantic data structures that should be returned.
    • makes it easier to swap between local/remote models by swapping the model passed to the agent, but otherwise using a common API.

    Hosted OLMo-2 model

    An advantage of any open weights model is being able to run it on a range of infrastructure (and being able to change the infrastructure later). 

    In this case, I had a use case where we wanted to do transformations on already public data (the appropriateness of linking to a specific Wikipedia page from a specific sentence in a parliamentary debate)  – and so there was no privacy/security issue for the purposes of the experiment. We are doing further exploration about how we can make this kind of use compliant with our wider legal and privacy commitments. 

    Because OLMo-2 is not a commonly used model, there isn’t an inference service that offers it directly as an option (which would be most efficient – as you’re being charged for tokens while the underlying infrastructure is shared between many users). Instead, you need to create a private server that can manage the model. 

    Creating an endpoint

    Hugging Face Inference Endpoints is the approach I used here – that lets you provision an endpoint connected to a specific model. I’m using the same model as I used locally.

    Depending on the properties of the model – the minimum GPU required will be suggested. This model was coming up about $0.8 an hour. Running the 13b parameter version of the model was about $2 an hour. There are options to run on AWS, Azure and Google Cloud in different regions (although processing data in the EU/UK is a requirement – this limits some of the GPU options). 

    The scale-to-zero time is adjustable down to about 15 minutes. It takes a few minutes to load up from this. In principle, if the access token is scoped correctly – the huggingface_hub library can handle pausing and unpausing the endpoint (or even programmatically creating one), if some more control here is wanted. 

    Structured data output

    This endpoint works well using some of the example HuggingFace connections for PydanticAI. Something I had to adjust was adding an adapter to reduce complex json schemas (e.g. anything with multiple model types, enums, etc) from using ‘$defs’ to just being a normal structure because the Hugging Face text-generation-inference interface can’t handle them. 

    I have an example of creating a model that Pydantic AI will accept here – the missing config bits are a token associated with the account and the url of the endpoint created. 

    So in principle this means we can have an endpoint that gives us access to a GPU based model for an hour a day at a reasonable price – while we could at a later point swap out to use a local model without adjusting the general logic of the application. This is well suited to our current anticipated uses in batched backend processes, but would be less efficient if it needed to be responsive around the clock.

    Reflecting on the results

    Compared to previous projects using the OpenAI API, a key thing to note is it is slower and more fiddly on the infrastructure at hand. I was only using the 7b parameter model, while the 32b parameter model is the one that evaluates closer to GPT-4o mini. As such, prompts needed to be a bit more detailed on what was required. Similarly, a combination of the hardware and not being able to run queries in parallel over a wider infrastructure mean the process takes longer. 

    But this is also like comparing cake to a well balanced meal – the benefits of an open model are not just philosophical but practical. With a bit more work on the prompt you can get useful results on a laptop with no dependency on third-party services. That brings into scope a range of use cases that OpenAI is not suitable for. 

    Even where, such as in the Wikipedia example, there are no privacy issues in using OpenAI, making it easy to swap in an open model makes it much easier to evaluate the effect of using an open model. It will now be relatively straightforward to quickly substitute OLMo-2 into PydanticAI flows using other models and get a baseline feeling for effectiveness. Even where you might choose to use a closed model in a specific instance, it is very useful to work in such a way that you are not locked in to that model and could switch away in future.

    Similarly, having a working process for a non-mainstream model like OLMo-2 makes it easier to explore other models like Apertus. As this has been trained on a wider range of non-English languages it could provide a more dependable component in LLM integration with the core Alavateli software – which powers Freedom of Information platforms across a range of languages. 

    Understanding open models as a practical approach helps contribute more widely to policy conversations around AI – and where trade-offs and impacts are inherent to the nature of the technology, or are a consequence of how they are currently controlled and produced. 

    Open models are always likely to lag slightly behind the frontier models, but they are already incredibly useful technologies compared to what was possible a few years ago. We want to understand more about how we can practically make use of these models – and help make sure the future of LLMs are shaped by ethical considerations about their training and use – rather than accepting them on the terms of the dominant tech giants. 

    Header image: Photo by Zhang Zi Han on Unsplash

  10. Using LLM tools to build APPG scrapers

    Recently we wrote about why we’re now listing APPGs in TheyWorkForYou. This blog post goes into more detail about the technical process we use to gather who is a member of an APPG.

    We have two methods of getting the memberships of APPGs. The first is finding if it’s already published on their website. The second is using Parliament’s rules to ask the APPG contact for the list. So we need to a) find all the APPG websites, and b) see if they publish members lists c) if not, ask for the list and d) get those lists into a consistent format.

    Data that is fragmented and not in the format we want is a fairly common civic tech problem. The solution is to write a ‘scraper’ that reads the content of a website and has a process for converting it to a more structured format. 

    This works well when dealing with only a few sources (e.g. the memberships of the UK’s parliaments only needs a few different scrapers), or where a common format is being used (e.g. many local government websites use similar providers). In the case of APPGs, there is no common template being used. We just have a set of a few hundred websites that may (or may not) contain a list of names. 

    Rather than a traditional scraper, we have built an agentic AI/LLM approach that is more flexibly able to extract memberships from websites.  The end result is a tool with a careful sequencing of manual and automated steps, injecting human review in structured ways. Rather than an “AI makes mistakes” disclaimer, we built a structured process to check elements efficiently one group at a time, that can lock off errors before proceeding to the next stage. This was also an experiment in using LLMs to write scraper tools, as well as some of the tools needed for the manual review steps. 

    Practically, this was an effective way of getting the information we needed that turned a very hard problem into one that we can dependably run regularly. It also suggests more generally useful ways of approaching fragmented data problems (more on this at the end of the post). 

    Building agentic approaches

    An ‘agent’ is often poorly defined, but broadly it’s a language model interface is given tools (specific functions), a task, and an output data structure, and it loops between these until it gives a result. 

    To build agentic functions, we used the PydanticAI framework, which acts as a connector between the prompt, input data, the data structure of the output data, functions the agent has access to, and any bespoke validation of the results. The end result is a function that accepts structured input, and returns structured output, relatively painlessly. 

    Although this example is using OpenAI’s GPT models, in future experiments we use the PydanticAI approach to connect to open source models (the framework is designed to be model-agnostic). In principle this means that this project could in future switch the underlying provider used. 

    Process

    Step 1: Writing a scraper 

    The first thing we needed to do was to get the official data from Parliament’s APPG register into a more structured form. 

    You can see an example of this page for the Africa APPG. This is a good task for a traditional scraper, but would also have been a fiddly problem. Using ChatGPT, we gave it an extract of the HTML, and asked for a Pydantic data structure and script to convert the data. This worked pretty well, with some tweaking to the format over time. When errors emerged in different APPGs – passing the error and an understanding of what should have happened back to the Copilot agent (using a Claude model) led to working fixes. In using the coding agent the key decision was deciding which bit of the project to be opinionated about – and this has mostly meant being very explicit about data structures (and validation to ensure they’re correct), and more relaxed about the pipes that connect things up. 

    Step 2: Adding categories to APPGs

    From the official data, we only know if an APPG is a county or subject area group. We want to make it a bit more explorable by breaking this down into categories. 

    In the spirit of experimenting with LLMs, we copied all subject areas APPGs names and purpose statements into one of OpenAI’s reasoning models and asked for 10-20 sub-categories. It came back with 20 and they looked reasonable.

    We then created a small functionless agent interface, giving it the title and purpose of a specific APPG, and returning a list of potential categories (preferring one, but allowing all that seem relevant). 

    Spot-checking these, they seem reasonable and for the purpose of breaking down the big list a bit – this is a good step up. This means, we can quickly see the APPGs that are likely to be relevant to environmental matters

    Step 3: Finding missing websites

    Some APPGs list their external website – some do not. Here we use AI tools as part of the workflow, to find those missing sites (which may not exist). 

    We created an agent function with access to a web search tool (tavity), a function to check if the URL is valid, and a prompt to help identify the correct site. This creates a loop to search and identify a good candidate for the website.

    At this point, there is a manual check that prompts the user to review each site one-by-one before confirming it as a valid site. 45/74 sites identified in the first wave were valid. Invalid websites were news articles, APPGs in other parliaments, or sites for previous iterations of that APPG.

    This is not comprehensive and we and our volunteers found some more manually after the fact – but it is an interesting trial in finding data starting only with a search engine. 

    Step 4: Find published members

    The final step is to get a list of members (if published) off these websites. We need a really flexible approach for this. Names might be in a structured list, but they can also be in one paragraph. They might be on a members page, the home page, or spread over three pages. There is no consistency to fall back on. 

    Here, we created an agent with a function that can fetch a web page and convert it to markdown. Using this recursively, the prompt instructs the agent to find the most relevant page (in some cases pages) that could contain membership information, and return a data structure of the members (MPs, Lords, Other). This returned over 5,000 names in the data format provided. 

    The big risk at this point is that having been asked for a list of MPs, it makes some up. The validation we use for this is to check if each name in the list is present within the HTML content of the page it was extracted from. If there’s an error, it runs again and will give up rather than use an incorrect list.  There is some possibility for misinterpretation – but this prevents outright fabrication. Errors flagged here tended to be when the LLM has fixed formatting meaning the text no longer matches exactly against the page.

    The key problem here is one that a human would have too – some APPG lists are out of date. Here I added an extra flag detecting a list containing people who had left Parliament that then needed a manual review. In other cases, this was sometimes picking up lists that were not membership lists. We made some adjustments to the prompt after picking up attendees at the AGM – which is not wrong, but incomplete. 

    Step 5. Manual data

    As our main blog post talks about, we then needed to contact APPGs directly for lists that were not published. This presented a new problem: what we got back was a combination of spreadsheets and emails with different levels of detail – some including party details in other columns, some not. 

    Our solution was to have a Google Doc that just has each list formatted under a heading with the APPG title – we could just copy and paste information into this. 

    This file is then downloaded as markdown and converted into a list of names. There are a few tweaks to clean up leading numbers, and identify the name component of the line. Again, this step was substantially written via prompt – giving the LLM examples of the problem data, and that would create regular expressions to clean the data into the basic list of names we needed. 

    Step 6: Tidy members information

    What we want to do next is get from a list of names to a list of TheyWorkForYou unique IDs. 

    We have a library that helps reconcile names to IDs, but a challenge here is that there are a huge range of spelling mistakes (sometimes to an extent where you could not actually work out the correct MP).

    What we needed was a quick tool to compare the input name against our list of known names and suggest near matches. Here we again turned to the coding agent, posing the problem, providing some snippets to interact with our existing library, and letting it craft a command line interface. 

    This fairly quickly gave a good interface for reviewing spelling problems (which was later refined to auto-match below a certain threshold). This helper tool is not especially complicated, but as something with a clear input and output, isolated from the rest of the flow, was a good candidate for testing using Copilot to create the function. In choosing what to spend time on, this would not otherwise have been a priority – but brought a useful feature into scope. 

    Result

    The end result of this process is fairly effective – with a series of steps we can repeat every six weeks when a new APPG register is released to check for new webpages for new APPGs, or to recheck previously scanned pages. 

    The efficient sequencing of steps means that manual review happens on similar tasks in sequence, rather than checking each APPG through all steps. 

    In general, I’m pretty happy with the results of this, it made a project that would otherwise only have been possible with a big (and fairly boring for participants) crowdsourcing effort possible. 

    One of the problems we have to deal with a lot is fragmented public data, when relevant data is scattered all over the place and is a lot of work to bring back together. Here we found AI tools that were both useful in discovery of a component of the data, and in reconciling to a common standard. 

    The “AI scrapes then verifies content is present” approach worked well here but would struggle with more complex problems. For instance, if we really needed to be sure we were extracting a correct party label alongside a name, knowing that ‘Labour’ was present on the page wouldn’t be as helpful. 

    Building on this, the AI-written scraper code worked pretty well. If properly sandboxed (pydantic-ai has support for running python in a sandbox using pyodide), transformation code could be written to convert data between different sets of headers without running the data itself through an LLM to convert it. This potentially helps with some of the fragmented data problems of reconciling compatible but different schemas. LLM-involved approaches have a real potential to create new datasets through easier discovery and joining of data.

    This is a way we can use new technology to make a dataset possible, but also it would be much easier if Parliament gathered and published this in the first place. The equivalent Cross Party Groups in the Scottish Parliament just make a downloadable file of all memberships in their open data portal. We need to think about how new technological approaches are not just propping up bad transparency – but part of encouraging better transparency all the way upstream. 

    Header image: Photo by Susan Holt Simpson on Unsplash