Evidence frameworks for Generative AI in International Development

The rapid advancement of generative AI has sparked widespread adoption across sectors, and international development and humanitarian organizations are no exception. As these organizations explore and implement AI solutions, particularly in knowledge management and information access, they face a critical challenge:** determining the real value and impact of these technologies** in contexts where traditional market metrics like user growth or revenue don't tell the complete story.

Unlike commercial AI applications where user engagement and willingness to pay provide clear signals of value, humanitarian and development initiatives require more nuanced evidence frameworks. Organizations must justify investments not just to donors, but also to their own leadership, partner organizations, and most importantly, to the communities they serve. This evidence needs to demonstrate not only operational efficiency gains, but also meaningful contribution to development or humanitarian outcomes and social impact.

The stakes are particularly high given limited resources and the ethical implications of deploying AI in vulnerable contexts. Organizations need robust evidence to guide decisions about continuing, scaling, or adjusting their AI initiatives, while ensuring these technologies genuinely serve their mission and don't inadvertently exacerbate existing inequities. This framework discussion aims to help organizations at various stages of AI implementation gather meaningful evidence that can inform strategic decisions, improve implementation, and ultimately ensure that AI investments truly advance development goals.

Understanding evidence in context

Evidence gathering for generative AI initiatives should align with implementation maturity. Unlike traditional development interventions where baseline assessments typically precede implementation, any fast evolving edge tech often requires a more iterative approach to evidence gathering. Organizations often feel pressure to demonstrate impact immediately, but meaningful evidence collection requires a staged approach that matches organizational readiness and implementation phase, while ensuring local ownership and leadership in the evidence gathering process.

Staged evidence framework

Exploration stage

This phase is characterized by initial experimentation with AI tools, often through sandbox environments or controlled testing. Organizations are typically identifying potential use cases, conducting preliminary assessments, and building internal familiarity with the technology, while actively engaging local CSOs and community representatives to shape use cases and assessment criteria. Similar to inception phases in traditional programming, evidence should focus on:

Technical feasibility in local contexts
Initial user acceptance and basic usability
Preliminary cost implications
Potential risks and ethical considerations
Community perspectives and concerns
Local capacity and readiness assessment Evidence gathering methods here might include structured documentation of test cases, user feedback sessions, and basic cost modeling. The emphasis should be on learning rather than impact measurement, much like rapid assessments in conventional development practice.

💡

A key consideration in this phase is the effect of (AI) digital literacy of the users on your evidence results: How familiar are they in using AI? How much guidance does the tool provide to fast track or enable further learning for novice users?

Pilot implementation

In this phase, organizations have identified specific use cases and are implementing controlled pilots with defined user groups. These implementations are structured but limited in scope, allowing for close monitoring and adjustment. The pilot design should incorporate principles of locally-led development and ensure equitable participation of local stakeholders. As organizations move beyond initial testing to these structured pilots, evidence collection should expand to include:

Operational efficiency metrics
User adoption patterns
Process improvement indicators
Early indication of benefits and challenges
Resource requirements and constraints
Local stakeholder feedback and concerns
Power dynamics in tool implementation
Data sovereignty considerations

Scaled implementation

At this stage, successful pilot implementations are being expanded to broader user groups or additional use cases. The technology is becoming integrated into regular operations and standard workflows. Only at this stage of operational maturity should organizations begin collecting evidence of broader impact:

Programmatic outcomes
Cost-effectiveness analysis
Systemic changes in organizational processes
Capacity development impacts
Sustainability indicators
Community empowerment metrics
Equity and inclusion measures
Local ownership indicators While this phase shares many characteristics with traditional impact evaluation, AI implementations often require more frequent assessment cycles due to the technology's rapid evolution.

Interactive evidence gathering

A significant departure from traditional M&E approaches lies in the opportunity to leverage the interactive nature of some AI tools for integrated evidence gathering. Unlike conventional post-intervention surveys or assessments, AI systems that have user-conversational components (e.g., chat bots) can collect valuable data through user interactions in real-time.

Benefits of integrated evidence collection

When AI tools engage users in conversation or interactive processes, they create natural opportunities to collect user feedback, track effectiveness, and measure outcomes in real-time. This integration offers several advantages:

Immediate user feedback on tool effectiveness
Contextual data about information needs and usage patterns
Reduced burden on separate monitoring and evaluation processes
Higher response rates compared to traditional feedback methods
More authentic user insights captured in the moment of interaction
Opportunities for participatory data governance
Direct community input in tool refinement

Implementation approaches

Organizations can integrate evidence gathering through:

Built-in feedback mechanisms within conversational flows
User satisfaction measurements at key interaction points
Usage pattern analysis
Outcome tracking through follow-up queries
Interactive refinement of responses based on user input This approach represents a significant evolution from traditional M&E methods, offering more continuous and granular data collection opportunities and can be as simple as adding a "feedback" tool or function in the case of function calling techniques with LLM's, or a feedback route for query route approaches.

Methodological considerations

Evidence types

Different contexts require different types of evidence. While traditional evaluation frameworks often emphasize outcome and impact metrics, AI implementations require additional focus on:

Quantitative metrics for operational efficiency
Qualitative assessments for user experience and adoption
Mixed methods for impact evaluation
Process documentation for organizational learning
Community-defined success metrics
Indigenous and local knowledge integration
Power dynamics assessment

Contextual factors

Evidence frameworks must consider:

Infrastructure limitations in developing and humanitarian contexts
Varying levels of (AI) digital literacy
Cultural and linguistic appropriateness
Local regulatory environments
Data sovereignty requirements
Existing inequities and power dynamics
Traditional knowledge systems and local innovation ecosystems

Key principles

Match evidence requirements to implementation stage
Prioritize learning over immediate impact demonstration
Consider local context in evidence gathering approaches
Balance rigor with practicality
Document both successes and failures(!)
Ensure meaningful local participation and leadership
Protect data sovereignty and indigenous knowledge
Challenge and transform power asymmetries
Support locally-led innovation

Conclusion

The integration of AI in international development requires a thoughtful evolution of evidence-gathering approaches. While sharing foundational principles with traditional development evaluation, AI implementations demand both technical adaptations and deep consideration of local leadership and equity. A staged approach to evidence gathering, aligned with implementation maturity and grounded in local contexts, allows organizations to learn and adapt while building toward meaningful impact assessment. As the field evolves, evidence frameworks must balance rigorous evaluation with flexibility, ensuring that the process remains responsive to local needs and promotes equitable outcomes. Success lies not in rushing to demonstrate impact, but in building sustainable, locally-owned systems for learning and improvement.

We would love to hear your thoughst on this and your experience in building evidence (frameworks) for your AI appliations in development and humanitarian contexts. Reach out!