
First Report: Democratic Inputs to AI
23 okt 2023
High-level overview
In this report we present Common Ground: An application for hosting iterative, open-ended and conversational democratic engagements with large numbers of participants (deliberations at scale). We also present our vision for wider end-to-end processes to elicit democratic inputs to any topic.
Common Ground can be conceptualised as a multi-player variant of Pol.is. Instead of voting on Statements in isolation, we match participants into small groups of three people where they are encouraged to deliberate over the Statements they vote on, and where an AI moderator powered by GPT4 synthesises new Statements from the content of their discussion.
Common Ground is one part of a more comprehensive end-to-end process. Therefore, this report also describes our current vision on how we can embed Common Ground into a given socio-political context. It includes suggestions and explorations of processes for generating Seed Statements, UX testing for a given population, selecting a demographically representative sample, iterating, and reporting on the results.

Team background
When OpenAI first announced the Democratic inputs to AI grant, Jorim, Founder of Dembrane, raced to build a team capable of delivering something really unique. This is that team:
The consortium is spearheaded by Dembrane (Jorim & Evelien), a startup committed to building accessible tools that enable democratic decision making at scale.
Pepijn and Lei, from Bureau Moeilijke Dingen, and bring decades of combined expertise in building complex applications.
Brett and Rich from the Sortition Foundation advised the team and provided critical feedback. Any truly democratic implementation of Common Ground would make full use of the Sortition Foundations participant selection services.
Ran advises the municipality of Eindhoven on law & ethics, and brings decades of experience working in the public sector. Ran organised
Aldo is the owner of CommunitySense (https://www.communitysense.nl/) and a freelance researcher who specialises in collaborative communities.
CeesJan is a communication and networking expert (https://www.linkedin.com/in/cjmol/).
Naomi, from event agency, brings the know how and the experience to deliver large online events
Rolf is a freelance researcher and consultant who specializes in online collaboration within civil society (https://drostan.org/).
Bram brings a unique perspective on ethics and LLMs. He wrote his master’s thesis on machine ethics at JADS.
Motivation
Each of our consortium members have their own motivations for working in this space. Some, decades deep. Overarching and foundational to these motives are that we are all designers at heart, and have heard the groans expressed by many at current democratic processes, and the growing pains of online modalities that have emerged in recent years. With the emergence of conversational AI, we saw an opportunity to explore an experimental alternative modality that is open ended, iterative, conversational and empowering.
eDemocracy often trades off intimacy for scale. Many eDemocracy applications, and many public online forums in general, operate on a principle that an individual interacts with the larger community in a one-to-many relationship. As algorithms for presenting content become more complex, this relationship has evolved into a one-to-AI-to-many, where algorithms mediate the relationship between “users”.
We want to elevate and scale small group socialisation and deliberation: While current systems have their advantages, we believe that a key component of a well functioning democracy lies in the more intimate, social interactions between humans. Our vision is for a platform built around these small group interactions, that produces democratic insights that have been “pre-processed” by the crucible of social interaction, and co-validated by other small groups. By using AI to prompt, process, moderate and link these interactions we can produce these democratic insights at scale. As AI improves, so will the platform.
Make the process as accessible as possible: While tech offers opportunities for scaling democratic processes, it also adds an extra hurdle for many people that are already struggling with all the required technological solutions in current societies. As Gilman states in her essay on Democratizing AI, many platform designers fall in the trap of designing solutions with users such as themselves in mind (Gilman, 2023). By designing a process where ordinary conversation is participation, we hope to minimise this technological barrier.
We are against simulated deliberation: As we worked, our motivations were sharpened. Alternative modalities emphasise the content of the discussion, and this may be necessary for some tasks. We firmly believe, however, that the people deliberating and their complexity, nuance and inconsistency are just as necessary to the end goal of a healthy democratic process.
Conclusions
Our prototype can deliver immediate value: Our primary objective was to question and demonstrate the potential of Large Language Models (LLMs) as pathway for scaling promising democratic processes like in-person citizens assemblies. Current institutions, although designed to address complex societal problems, are struggling with a decline in public trust and struggling to keep up with the speed and magnitude of contemporary issues and technology development. We believe people have a desire to come together and discuss these issues and that intuitions may benefit from more widespread and higher bandwidth forms of engagement. We believe our prototype can add value here.
We might need different tools for different situations: Based on our interaction with OpenAI and the other projects, it is our impression that democratic inputs to AI and AI contributions to democracy can take many different forms, should take many different forms and have to be developed in and with their local contexts. As a commentator on our initial proposal remarked “Socio-cultural specificity, not generalisability, might be a strength”.
We are committed to iteration and integration with other processes: We repeatedly realised how key iteration and multi-scale processes are for any democratic process. For this reason, we are very please that Common Ground is capable of working together with other processes (see Intended Uses and Limitations), both on and offline. Further refining the socio-political processes that provide inputs to Common Ground, as well as re-validating it’s outputs with other democratic processes such as those validated by the Collective Dialogues team will be key if we look to mature implementations.
Further development of Common Ground demands vigilance: This criticality has two parts. Firstly, we must continually question the underlying assumptions, potential pitfalls, risks, and possible unintended adverse effects of introducing AI into democratic processes. Not the least by always checking and refining LLM outputs with real people, or we risk falling into the fallacies and risks of democracy in silica.
Secondly, our process is inherently and somewhat intimately social. While this is by design, we observe that a significant portion of the population self selects out of such explicitly social interactions with strangers. These are similar issues faced by in-person citizens assemblies, where a small portion of the participants may need repeated encouragement before they share their opinions and gain confidence. While human facilitators were on call to help during the experiment, looking into active facilitation, coaching and aftercare for more sensitive participants may be crucial when deploying Common Ground.
We built and tested an innovative tool. We look forward to deploying it in the real world. Our project sometimes struggled to balance ambitious goals and realistic timelines. We aimed to manifest transformative impacts while simultaneously focusing on prototype implementation and user engagement. We learned that this was an overly ambitious objective within the given timeframe. Striking the right balance between aspiration and practicality is critical, and this learning will guide our future endeavours. Now that we have refined the prototype to the extent it is capable of hosting hundreds of simultaneous participants, we can look forward to implementing it in real world democratic processes.
Intended uses, values and limitations
Intended uses
We think that Common Ground can be an excellent choice for running a deliberative exercise where stronger ties between participants, open-endedness and iteration are high priorities. In this way, we can position Common Ground in the existing landscape. Looking at current matured tools, there is pol.is on the one hand, which is open ended and easy to use but does not bring people together for stimulating conversation, and the Stanford deliberation software based on deliberative polling which, while valuable, is bookended by a poll of closed questions. Common Ground combines the benefits of human to human deliberation with the simplicity and open-ended nature of a pol.is.
The hard problem we are trying to solve and will continue to iterate on is how to make these conversations as rich, stimulating and diverse as possible, while also producing actionable and defensible outcomes.
Common ground is part of an end-to-end process that must be customised to a specific context. Common Ground could in theory be used standalone, but in practice must be preceded by a participant selection process and an initial Seed Statement generation process. Data analysis and summarisation methods must also be integrated into the tool and we are currently exploring integration with other methods (some from this OpenAI grant) that focus on this step.
Crucially, we do not think that Common Ground must be limited to deliberation between democratically selected participants. It could just as well be used to host conversations between experts from varying fields, or between members of membership driven organisations like unions, NGO’s or for profit companies with large contributor communities.
Values & Limitations
We have created the basic technological interaction infrastructure for supporting scalable, AI-moderated discussions. However, our prototype platform is only the first step on the way to reaching AI-mediated democratic impacts at scale. Reaching that level of impact still requires a lot of work and inputs from many different stakeholders. In this section, we explore some of the promises and pitfalls of both where we are now and where we could and should take development efforts next. We do this by first focusing on the values that have been and could be created. We combine this with reflections on some of the socio-technical limitations of the current implementation, which also suggest next steps for our development roadmap.
Value creation framework
Learning interactions
Immediate Value: Activities and interactions
Potential Value: Knowledge capital
Applied value: Changes in practice
Realised value: Performance improvement
Transformative value: Redefining success
Limitations
Process details
Key Themes
AI-facilitated deliberation process for community involvement.
Four sub-processes: Community engagement, Institutional engagement, Representative participant selection, Deliberation at scale.
Importance of understanding community and institutional needs and concerns.
Participant selection based on the population’s size and demographic traits.
Role of AI in managing group deliberations, proposal generation, and result outputs.
End to end
The end-to-end process is AI-facilitated deliberation at scale. The inputs to this process are a socio-political context (such as a city, a country or an organisation) and a topic to deliberate. The outputs are insights into what people in this context think about the topic, as well as actionable common ground. Additional impacts are that the participants are empowered and that they build trust and that they have enjoyable experiences.
This process can be broken down into four sub-processes.
Community engagement
We believe that it's important to recruit local participants early on in the process, and this can be done through snowball sampling, to understand what people care about, what their needs are, what their worries are, and how they relate to the institutions that are serving them. The additional impacts are that we increase community engagement and publicity.
The way we did this in our process is that we performed interviews with people from the community. We did this with street interviews, but also with phone calls and video calls that people could book with us. In our process this was quite low key, but one could of course be more rigorous in documenting the insights from these conversations, collecting possible seed statements and publishing meaningful stories.
Institutional engagement
Understanding the institutions serving a given context is crucial to implementing an effective democratic process. This can be done by recruiting non-partisan panels of experts through purposive sampling. Networking and access are key factors here to achieve a good scope and an in-depth understanding of how and why an institution would listen to the outcomes generated by the process. Additional impacts of this process are, for example, building trust with decision makers.
In our process, we did this by engaging with employees of the local municipality and assembling ethical experts and people working in the social sector, as well as politicians, to provide inputs on what they thought was important about our process, what things we should consider, and providing a much-needed reflection.
This was an important part of our process, as it showed us that the needs and worries of the local government were different from the needs and worries of, for example, OpenAI.
Representative participant selection
The third sub-process, is representative participant selection, and this can be broken down into two steps: defining a population and then selection, which could be opportunistic, or more rigorous with two step stratified random sampling.
In the case of running a process for a city, defining the population means defining whether we just mean people who live in the city or also those commute to the city could be the population. If we're talking about a company or an organisation, it might be members or also employees of that organisation.
The selection depends on the size of the population and a legitimacy tradeoff. If the size of the population is large and legitimacy must be high, stratified and randomised participant selection is the gold standard. This process is defined, outlined and delivered by the sortition foundation, where they take a set of potential participants then do two-step stratified selection, where the output is a set of participants that are maximally representative of a given community.
In our process, we used Prolific to select a balanced sample of participants. This is not ideal for democratic inputs, but can be easily swapped out for a more rigorous participant selection process.
Find Common Ground
The fourth sub-process is driven by the Common Ground application itself, or the deliberation at scale, and this is broken down into five sub-processes.
What a run of the process looks like
Caveat: We tried to run a process of crowd-sourcing inputs to AI while crowd sourcing inputs to our tool while we were building. We could have done this more rigorously, but it did lead to very valuable outcomes that strongly influenced the design of the tool. If you are interested in these details see (Results).
Step 1: Set the scene
In our case, our socio-political context was the municipality of Eindhoven, and the topic was “Democratic inputs to AI” - which we combined into the topic statement “Can AI help local government?”.
Step 2: Engage rule affected people
We put out a quick website with links for people to reach out to us that we used to explain the process, show mockups and prototypes and invite critique. In total, 17 people participated in these calls. We also interviewed people on the streets to find out how a wide variety of people were thinking (or not) about AI - in total we interviewed 22 people.
Outcomes: These conversations were recorded, transcribed using Whisper v1 and analysed both by and and with GPT4 to distill the findings.
Step 3: Engage decision makers
In parallel, we organised sessions with the local government of Eindhoven where we talked to ethical experts and legal experts within the municipality in order to understand their context and what that they would like to discuss.
Outcomes: Civil servants appreciated the value of democratic inputs to AI, but criticised the list of initial questions provided by OpenAI. For them, there was a clear conflict of interest for an AI company to spearhead a democratic process about AI. The tension within the group revolved around the ongoing EU debate about technology “gatekeepers” and the balance of innovating with closed source LLMs such as GPT4, vs waiting for open source and or EU based solutions. On the basis of these discussions we agreed to host a round table discussion with various legal, ethical and social work experts to dig deeper. At the request of the participants, and unlike the interviews with the community, these discussions were not recorded.
Step 4: Make a list of seed statements
From these discussions, we then formulated a list of seed statements, which you can find below. These seed statements were used as the inputs to the common ground deliberation. The process of creation of these seed statements is an important step and has a large impact on the perceived legitimacy of the process. We are developing a rigorous method for this.
Step 5: Select participants
In our process, for simplicity we did that with participant recruitment platform Prolific. We made a custom Prolific login flow for the Common Ground application, whereby we just send a simple link to Prolific and participants can log in with their Prolific ID.
We set the prolific settings to include people who gave consent to use their webcam during a study, and set the number of total participants to 450. For those interested, we are happy to share study description and the demographic data.
Step 6: Run the app
A specific aspect with Prolific is that they don’t yet support time sensitive studies, so you won't get hundreds of people on the application at once. We had to open up the deliberation, wait for people to start rolling in (and apologise to the first participants who had to wait a little before getting matched), and then once there was a critical mass, the average wait time in the queue went down.
Built into Common Ground, we have a help button that sends a notification to the moderator on call, who can easily join the conversation and resolve any issues. The only calls we received we technical issues caused by the AI moderators not progressing if one of the participants left the call.
We ran the study for a total of 8 hours. After the Prolific test was over, we had about 450 participants, of which 350 deliberated for more than 30 minutes. We compensated participants for £11 an hour. Once the data collection had finished, we then went into the results phase.
Step 7: Run the results engine
First we took the list of outcomes that had been generated by the Common Ground application, and ran some SQL queries to create an aggregate view of the data needed for analysis. The resulting data was a list of all the statements and their votes, which was exported as a CSV and brought over to Google Colab.
From that list of statements, we calculated the Chi-squared statistic. Because of the relatively low sample of votes per statement, the p-value was 0.0 for many of them and wasn't much use. Instead, we directly used the Chi-squared statistic as a stand-in measure for a surprisal, or how different the votes on any given statement were from the average votes on all statements.
We then calculated the difference in votes for agree percentage and disagree as a percentage of total votes which we called agreement difference.
We ranked the statements by their Chi-squared statistic then selected the top statements that had more agreement than disagreement until we filled our context limit for the LLM. This was a list of about 20 statements. We did the same for statements with more disagree votes. This generated our final list of statements.
Finally, we took these statements and fed that into GPT4 with a prompt aimed at summarising and deduplicating the statements, and then generating a new set of statements.
We started with the topic, can AI help local government, but the statements generated by participants went beyond local government and also included statements about LLM designers, research labs, education, etc. Ideally we would then take those statements and perhaps rerun them, or refine them with the help of experts, and then rerun the refined statements.
Results
It was our intention to run a process that reflected our vision of a gold standard democratic process, but due to the amount of time required for developing our prototype, our biggest result is the prototype itself.
The initial idea of the app was to match strangers in video calls together, to transcribe their conversation live, and to have a dynamic and interactive AI moderator that could intelligently guide the conversation to generate statements. Live transcription proved to be a bigger technical difficulty than we expected, partly due to initial optimistic estimates at using Whisper version 1 as a live transcription engine in a conversational dialogue.
After our trip to San Francisco, we decided to pivot to a non-transcription based version. This was difficult but necessary given the time constraints, and led to expected feedback that it was difficult to switch between a spoken and a typed modality. Despite this, the statements seemed to really spark the conversation with the participants and guide them towards having a productive and pleasant time with one another.
Throughout this process, we've conducted five live tests (two internal and three with external perticipants) with a combined 491 people. We now have clear goals for what we want to achieve with the next iteration, specifically relating to live transcription, filtering low effort responses and prioritising statements with a high surprisal.

Test runs with deliberating participants
Test 1 - Eindhoven: UX test
Test 2 - Prolific 1: Technical scale test
Test 3 - Prolific 2: Data collection
The final statements, generated by our results engine.
AI should be utilized by local governments to enhance efficiency and service delivery, while ensuring responsible use and preventing overdependence.
AI moderation tools should be integrated in local government processes, supplementing, not replacing, human moderators for balanced deliberation.
AI technology in local governance must uphold neutrality, accuracy, and unbiasedness to optimize decision-making and safeguard user privacy.
AI systems in local government should continuously update and present credible, diverse, and balanced information to build public trust and confidence.
AI developers must design systems that prioritize safety, especially for children's interaction, and do not promote improper interaction to maintain healthy societal dynamics.
AI should not replace human support in emotional assistance to preserve the unique nature of human empathy and must be supervised to ensure this.
Education authorities should effectively implement AI to personalize learning experiences, thereby optimizing resources and freeing up staff for critical tasks.
All stakeholders involved in AI research must ensure robust data security and avoid unnecessary data retention to safeguard personal data privacy.
AI stakeholders should strive for accessible solutions with appropriate safeguards to ensure more positive than negative social impacts.
AI should be designed to assist, not replace, human interactions, thereby preserving our inherent social nature and ensuring appropriate human oversight.
Governments should develop comprehensive AI governance frameworks without jeopardising innovation to promote further technological advancement.
AI stakeholders should program inclusivity into their strategies, without profit as the main aim, to ensure equal opportunities and societal cohesion.
Local governance should cautiously adopt AI to improve efficiency and provide learning opportunities, always considering expert advice to ensure human jobs are not replaced.
AI developers must rigorously test their systems before deployment, particularly in domains such as education, to ensure societal good and trust in technology.
Society should cautiously approach the implementation of AI, considering potential complexities, to respect varying individual comfort levels with technology.
People should have the choice to use AI as a tool to simplify tasks if they wish, mindful of some people's struggles with technology, and companies must ensure AI use is not restrictive or frustrating.
Evaluation
We evaluated the process in two ways. The first was that we conducted a workshop with ethical experts from the municipality to drill down into our assumptions and make recommendations. The second was that we asked Prolific participants to rate the experience on a likert scale.
Likert scale responses
Feedback from the ethical board
Next steps
Communicating findings & transparency
The current technical prototype very much focussed on the in-group deliberation and getting valuable results from those touch-points. However, in a democratic setting being communicative and transparent about these results is as important as the underlying process. For this we have started with a first iteration for the dashboard presenting these findings in different ways.
Ways to collaborate
In this section, we explore the potential use cases for Common Ground and propose potential avenues for collaboration. We will shed light not only on how this tool can be applied but also on larger opportunities for deployment, further development, and academic investigations. To discuss your ideas and interests in collaboration, please don't hesitate to contact us.