TL;DR - We introduce a new method for reviewing development and humanitarian programs using AI. The approach uses large language models (LLMs) in collaboration with human expertise to get results within minutes rather than months. The integration of AI and human judgment aims to enhance the scale, consistency, speed, and depth of program analysis while maintaining accountability and learning. The methodology involved structured rubrics, AI-powered document analysis, and human-in-the-loop verification. The framework is demonstrated through an example of assessing whether development programs were locally-led across six dimensions. The goal was to create a scalable, transparent, and systematic framework to assess program effectiveness and alignment with key principles. The process included setting up the framework, conducting intermediate and final analyses, and ensuring quality control. A proof of concept is being built and will be shared soon.
The development and humanitarian sectors generate vast amounts of documentation about their programs, including project evaluations, logical frameworks, theories of change, and various monitoring reports. However, systematically analyzing this wealth of information to assess program effectiveness and alignment with key principles has traditionally been a labor-intensive process. This means that critical lessons about what works and what doesn’t are often lost or hidden.This article explores an innovative approach that combines the analytical capabilities of large language models (LLMs) with human expertise to create a scalable, transparent, and systematic program analysis framework.
Development and humanitarian programs are complex interventions with multiple stakeholders, objectives, and outcomes. Traditional program analysis often faces several challenges:
Our approach combines human expertise in framework design with AI's capability to process and analyze large volumes of text. The framework consists of three main components:
Our methodology follows a systematic approach. While we use the assessment of locally-led development programs as our primary example, it's important to note that this framework can be adapted for various types of program analysis. Below, we clearly indicate which elements are specific to our locally-led development case study and which are part of the general framework. Guidance for other applications is provided below.
The foundation of the analysis is a carefully designed rubric system. While the specific rubrics or dimensions will vary based on what's being assessed, the process of developing these rubrics remains consistent:
Rubric Definition
Question Framework
Evidence Requirements
Taking the example of assessing locally-led development programs, we developed six key rubrics.
Rubric | Definition | Objective questions |
Stakeholders | All of the actors engaged in the program | 1. Create a stakeholder map 2. Use map this to determine whether they are local or not. Examples: Local staff of foreign led funders, international organizations are not local; Local means their headquarters is in the community, city, country or region where the work is done. |
Local Agenda Setting | The program's priorities and objectives are determined primarily by local actors during the design phase, prior to implementation. Local individuals, communities, or organizations play a leading role in identifying needs and setting the agenda. External partners may provide input, but final decisions on program focus align closely with local perspectives and desires. | 1. Who determined the priorities and objectives for the program? For example, the program objectives align with national or regional WASH policy or needs assessments; based on community surveys before the program was designed; or were determined based on the funder's priorities/strategies (e.g., U.S. Global Water Strategy). 2. When were the program priorities and objectives determined and by whom? |
Local Solution Development | Solutions and strategies are primarily developed by local stakeholders. The program demonstrates a strong reliance on local knowledge and expertise in designing interventions. External partners may offer technical support, but the core ideas and approaches originate from or are significantly shaped by local actors. | 1. What technical solutions and approaches were used in the program and how were the technical solutions and approaches selected and by whom? |
Local Resource Mobilization | The program leverages local resources, both human and financial, to a significant degree. Local actors contribute meaningfully to the program's implementation, either through direct funding, in-kind contributions, or by providing essential human resources. The program builds on existing local capacities rather than relying primarily on external inputs such as foreign consultants | 1. What resources went into the program (financial, staffing, materials, other)? 2. How much of those resources in percentage did local actors contribute to the program? |
Local Decision-Making Power | Key decisions throughout the program cycle are made predominantly by local stakeholders. This includes decisions on resource allocation, implementation strategies, and adaptive management. The governance structure of the program gives substantial weight to local voices in steering the initiative. | 1. What decisions needed to be made during the program cycle? 2. How were those decisions made and by whom? 3. What were the success metrics based on? |
Capacity Building for Sustainability | The program has a clear focus on strengthening local capacities for long-term self-reliance. It includes specific components or strategies aimed at enhancing local leadership, technical skills, and organizational capabilities. The program design anticipates and prepares for a gradual transition to full local ownership and management. | 1. Who are the actors responsible for maintaining the outcomes? 2. What type of capacity building activities were part of the program and how did they support the responsible actors? |
The text from the definitions are used as context for the LLM during extraction. And the questions are provided to LLM which is prompted to answer them using the content from the program documents (see below) to generate traceable intermediary outputs.
A core component of any AI-powered program analysis framework lies in how we organize and present documentation to the system. While sophisticated data integration methods exist, we recommend starting with the simplest approach: a well-curated folder of program documents.
For the purpose of our experiment we downloaded various documents from the USAID Development Experience Clearinghouse (DEC) and the Global Waters.org websites which as of January 2025 have both been shutdown.
Large language models (LLMs) can effectively differentiate between document types, making it possible to start with a basic collection of:
As your analysis needs grow, the framework can scale to incorporate more sophisticated data access methods:
LLMs excel at understanding document context and content - they can readily distinguish between an evaluation report and a blog post, or between a logical framework and a financial statement. This capability means we can focus more on document quality and relevance rather than rigid classification systems or complex metadata frameworks.
The framework leverages large language models (at least Sonnet 3.7, Mistral Large, Lama 3 70b) to:
First, we define what we want to capture about each stakeholder:
STAKEHOLDER INFORMATION TO CAPTURE: - Name of Organization - Location of Headquarters - Is the organization local? (based on HQ location vs project location) - Role in Project (must be one of): * Donor * Implementing Partner * Technical Partner * Research Partner * Evaluation Partner * Government - Description of their involvement - When they were involved in the project - Source of this information (page number and exact text from documents)
Then, we give the AI clear instructions on how to analyze the documents:
INSTRUCTION PROMPT TO AI: "You are analyzing program documents to create a stakeholder map. Key rules: 1. Only use explicitly stated information from the provided documents 2. Do not make assumptions about organizations or their roles 3. If information is missing, mark it as unknown 4. For each stakeholder found, provide: - The exact text snippet that mentions them - The page number where they are mentioned - Clear evidence for their role classification 5. Mark an organization as 'local' only if there is clear evidence their headquarters is in the project implementation country"
The extraction process follows this logic for *each and every document *that is available.
This process ensures:
The human element remains crucial for:
The intermediate analysis phase varies based on the type of assessment being conducted. This builds an audit and reference trail for each rubric and provides semantic data for further analysis.
Here's how it worked in our locally-led development example. For each of the dimensions in the rubric, we used AI to extract and record a list of excerpts from the documentation available. This includes answering the questions. For example:
Local Agenda Setting | “The program team organized multiple design meetings prior to the start led by Country Org X” (p.12, Business Case.pdf) … |
The human reviewed and refined the rubric definitions to ensure we extract relevant information.
We generated multiple outputs to form the analysis of the programme.
We fed the full extracted segments into the scoring AI module for each rubric dimension. This produced a low, medium, or high score based on:
The AI narrative module then generated detailed justifications for scores from the intermediate outputs analysing program strengths and weaknesses and providing evidence-based explanations of findings.
We generated a comprehensive executive program summary from the locally-led perspective, incorporating key findings across dimensions, critical success factors, and areas for improvement.
Our experimentation has unsurprisingly revealed the significant impact of prompt design on analysis quality:
While our example focuses on locally-led development assessment, the framework is adaptable to various analysis needs:
This approach opens new possibilities for program analysis:
The integration of AI capabilities with human expertise offers a promising path forward for program analysis in the development and humanitarian sectors. This approach maintains the crucial role of human judgment while leveraging technology to enhance the scale, consistency, and depth of analysis possible.
By maintaining clear documentation of the analysis process and ensuring traceability of findings, the framework supports both accountability and learning. As AI capabilities continue to evolve, this human-in-the-loop approach provides a foundation for increasingly sophisticated program assessment while ensuring that analysis remains grounded in sector expertise and contextual understanding.
The framework's success in analyzing locally-led development programs demonstrates its potential for broader application across various types of development and humanitarian interventions. Future refinements will likely focus on expanding the range of assessment criteria, improving prompt engineering techniques, and developing more sophisticated methods for synthesizing findings across multiple programs and contexts.
This work is being developed in collaboration with Susan Davis, an accomplished international development expert. Susan brings decades of expertise in program evaluation and is particularly focused on driving investments to locally-led development approaches. While working on her book about international development experiences, she is exploring innovative ways to leverage AI for synthesizing learnings from development program evaluations. Her current work spans philanthropic advising, activism for effective development, and strategic consulting for locally-led social impact organizations. Her deep understanding of program evaluation and commitment to advancing locally-led development, has been instrumental in shaping this analytical framework.
If you wish to apply this tool in your organization or work, reach out to us:
Olivier Mills
Founder & CEO, Baobab Tech
Susan Davis
Philanthropic advisor championing equitable social innovation