Evidence, Opinion  |  January 3, 2025

Evidence frameworks for Generative AI in International Development

The rapid advancement of generative AI has sparked widespread adoption across sectors, and international development and humanitarian organizations are no exception. As these organizations explore and implement AI solutions, particularly in knowledge management and information access, they face a critical challenge: determining the real value and impact of these technologies in contexts where traditional market metrics like user growth or revenue don't tell the complete story.

Unlike commercial AI applications where user engagement and willingness to pay provide clear signals of value, humanitarian and development initiatives require more nuanced evidence frameworks. Organizations must justify investments not just to donors, but also to their own leadership, partner organizations, and most importantly, to the communities they serve. This evidence needs to demonstrate not only operational efficiency gains, but also meaningful contribution to development or humanitarian outcomes and social impact.

The stakes are particularly high given limited resources and the ethical implications of deploying AI in vulnerable contexts. Organizations need robust evidence to guide decisions about continuing, scaling, or adjusting their AI initiatives, while ensuring these technologies genuinely serve their mission and don't inadvertently exacerbate existing inequities. This framework discussion aims to help organizations at various stages of AI implementation gather meaningful evidence that can inform strategic decisions, improve implementation, and ultimately ensure that AI investments truly advance development goals.

Understanding evidence in context

Evidence gathering for generative AI initiatives should align with implementation maturity. Unlike traditional development interventions where baseline assessments typically precede implementation, any fast evolving edge tech often requires a more iterative approach to evidence gathering. Organizations often feel pressure to demonstrate impact immediately, but meaningful evidence collection requires a staged approach that matches organizational readiness and implementation phase, while ensuring local ownership and leadership in the evidence gathering process.

Staged evidence framework

Exploration stage

This phase is characterized by initial experimentation with AI tools, often through sandbox environments or controlled testing. Organizations are typically identifying potential use cases, conducting preliminary assessments, and building internal familiarity with the technology, while actively engaging local CSOs and community representatives to shape use cases and assessment criteria. Similar to inception phases in traditional programming, evidence should focus on:

  • Technical feasibility in local contexts
  • Initial user acceptance and basic usability
  • Preliminary cost implications
  • Potential risks and ethical considerations
  • Community perspectives and concerns
  • Local capacity and readiness assessment

Evidence gathering methods here might include structured documentation of test cases, user feedback sessions, and basic cost modeling. The emphasis should be on learning rather than impact measurement, much like rapid assessments in conventional development practice.

💡

A key consideration in this phase is the effect of (AI) digital literacy of the users on your evidence results: How familiar are they in using AI? How much guidance does the tool provide to fast track or enable further learning for novice users?

Pilot implementation

In this phase, organizations have identified specific use cases and are implementing controlled pilots with defined user groups. These implementations are structured but limited in scope, allowing for close monitoring and adjustment. The pilot design should incorporate principles of locally-led development and ensure equitable participation of local stakeholders. As organizations move beyond initial testing to these structured pilots, evidence collection should expand to include:

  • Operational efficiency metrics
  • User adoption patterns
  • Process improvement indicators
  • Early indication of benefits and challenges
  • Resource requirements and constraints
  • Local stakeholder feedback and concerns
  • Power dynamics in tool implementation
  • Data sovereignty considerations

Scaled implementation

At this stage, successful pilot implementations are being expanded to broader user groups or additional use cases. The technology is becoming integrated into regular operations and standard workflows. Only at this stage of operational maturity should organizations begin collecting evidence of broader impact:

  • Programmatic outcomes
  • Cost-effectiveness analysis
  • Systemic changes in organizational processes
  • Capacity development impacts
  • Sustainability indicators
  • Community empowerment metrics
  • Equity and inclusion measures
  • Local ownership indicators

While this phase shares many characteristics with traditional impact evaluation, AI implementations often require more frequent assessment cycles due to the technology's rapid evolution.

Interactive evidence gathering

A significant departure from traditional M&E approaches lies in the opportunity to leverage the interactive nature of some AI tools for integrated evidence gathering. Unlike conventional post-intervention surveys or assessments, AI systems that have user-conversational components (e.g., chat bots) can collect valuable data through user interactions in real-time.

Benefits of integrated evidence collection

When AI tools engage users in conversation or interactive processes, they create natural opportunities to collect user feedback, track effectiveness, and measure outcomes in real-time. This integration offers several advantages:

  • Immediate user feedback on tool effectiveness
  • Contextual data about information needs and usage patterns
  • Reduced burden on separate monitoring and evaluation processes
  • Higher response rates compared to traditional feedback methods
  • More authentic user insights captured in the moment of interaction
  • Opportunities for participatory data governance
  • Direct community input in tool refinement

Implementation approaches

Organizations can integrate evidence gathering through:

  • Built-in feedback mechanisms within conversational flows
  • User satisfaction measurements at key interaction points
  • Usage pattern analysis
  • Outcome tracking through follow-up queries
  • Interactive refinement of responses based on user input

This approach represents a significant evolution from traditional M&E methods, offering more continuous and granular data collection opportunities and can be as simple as adding a "feedback" tool or function in the case of function calling techniques with LLM's, or a feedback route for query route approaches.

Methodological considerations

Evidence types

Different contexts require different types of evidence. While traditional evaluation frameworks often emphasize outcome and impact metrics, AI implementations require additional focus on:

  • Quantitative metrics for operational efficiency
  • Qualitative assessments for user experience and adoption
  • Mixed methods for impact evaluation
  • Process documentation for organizational learning
  • Community-defined success metrics
  • Indigenous and local knowledge integration
  • Power dynamics assessment

Contextual factors

Evidence frameworks must consider:

  • Infrastructure limitations in developing and humanitarian contexts
  • Varying levels of (AI) digital literacy
  • Cultural and linguistic appropriateness
  • Local regulatory environments
  • Data sovereignty requirements
  • Existing inequities and power dynamics
  • Traditional knowledge systems and local innovation ecosystems

Key principles

  1. Match evidence requirements to implementation stage
  2. Prioritize learning over immediate impact demonstration
  3. Consider local context in evidence gathering approaches
  4. Balance rigor with practicality
  5. Document both successes and failures(!)
  6. Ensure meaningful local participation and leadership
  7. Protect data sovereignty and indigenous knowledge
  8. Challenge and transform power asymmetries
  9. Support locally-led innovation

Conclusion

The integration of AI in international development requires a thoughtful evolution of evidence-gathering approaches. While sharing foundational principles with traditional development evaluation, AI implementations demand both technical adaptations and deep consideration of local leadership and equity. A staged approach to evidence gathering, aligned with implementation maturity and grounded in local contexts, allows organizations to learn and adapt while building toward meaningful impact assessment. As the field evolves, evidence frameworks must balance rigorous evaluation with flexibility, ensuring that the process remains responsive to local needs and promotes equitable outcomes. Success lies not in rushing to demonstrate impact, but in building sustainable, locally-owned systems for learning and improvement.


We would love to hear your thoughst on this and your experience in building evidence (frameworks) for your AI appliations in development and humanitarian contexts. Reach out!