The rapid advancement of generative AI has sparked widespread adoption across sectors, and international development and humanitarian organizations are no exception. As these organizations explore and implement AI solutions, particularly in knowledge management and information access, they face a critical challenge: determining the real value and impact of these technologies in contexts where traditional market metrics like user growth or revenue don't tell the complete story.
Unlike commercial AI applications where user engagement and willingness to pay provide clear signals of value, humanitarian and development initiatives require more nuanced evidence frameworks. Organizations must justify investments not just to donors, but also to their own leadership, partner organizations, and most importantly, to the communities they serve. This evidence needs to demonstrate not only operational efficiency gains, but also meaningful contribution to development or humanitarian outcomes and social impact.
The stakes are particularly high given limited resources and the ethical implications of deploying AI in vulnerable contexts. Organizations need robust evidence to guide decisions about continuing, scaling, or adjusting their AI initiatives, while ensuring these technologies genuinely serve their mission and don't inadvertently exacerbate existing inequities. This framework discussion aims to help organizations at various stages of AI implementation gather meaningful evidence that can inform strategic decisions, improve implementation, and ultimately ensure that AI investments truly advance development goals.
Evidence gathering for generative AI initiatives should align with implementation maturity. Unlike traditional development interventions where baseline assessments typically precede implementation, any fast evolving edge tech often requires a more iterative approach to evidence gathering. Organizations often feel pressure to demonstrate impact immediately, but meaningful evidence collection requires a staged approach that matches organizational readiness and implementation phase, while ensuring local ownership and leadership in the evidence gathering process.
This phase is characterized by initial experimentation with AI tools, often through sandbox environments or controlled testing. Organizations are typically identifying potential use cases, conducting preliminary assessments, and building internal familiarity with the technology, while actively engaging local CSOs and community representatives to shape use cases and assessment criteria. Similar to inception phases in traditional programming, evidence should focus on:
Evidence gathering methods here might include structured documentation of test cases, user feedback sessions, and basic cost modeling. The emphasis should be on learning rather than impact measurement, much like rapid assessments in conventional development practice.
A key consideration in this phase is the effect of (AI) digital literacy of the users on your evidence results: How familiar are they in using AI? How much guidance does the tool provide to fast track or enable further learning for novice users?
In this phase, organizations have identified specific use cases and are implementing controlled pilots with defined user groups. These implementations are structured but limited in scope, allowing for close monitoring and adjustment. The pilot design should incorporate principles of locally-led development and ensure equitable participation of local stakeholders. As organizations move beyond initial testing to these structured pilots, evidence collection should expand to include:
At this stage, successful pilot implementations are being expanded to broader user groups or additional use cases. The technology is becoming integrated into regular operations and standard workflows. Only at this stage of operational maturity should organizations begin collecting evidence of broader impact:
While this phase shares many characteristics with traditional impact evaluation, AI implementations often require more frequent assessment cycles due to the technology's rapid evolution.
A significant departure from traditional M&E approaches lies in the opportunity to leverage the interactive nature of some AI tools for integrated evidence gathering. Unlike conventional post-intervention surveys or assessments, AI systems that have user-conversational components (e.g., chat bots) can collect valuable data through user interactions in real-time.
When AI tools engage users in conversation or interactive processes, they create natural opportunities to collect user feedback, track effectiveness, and measure outcomes in real-time. This integration offers several advantages:
Organizations can integrate evidence gathering through:
This approach represents a significant evolution from traditional M&E methods, offering more continuous and granular data collection opportunities and can be as simple as adding a "feedback" tool or function in the case of function calling techniques with LLM's, or a feedback route for query route approaches.
Different contexts require different types of evidence. While traditional evaluation frameworks often emphasize outcome and impact metrics, AI implementations require additional focus on:
Evidence frameworks must consider:
The integration of AI in international development requires a thoughtful evolution of evidence-gathering approaches. While sharing foundational principles with traditional development evaluation, AI implementations demand both technical adaptations and deep consideration of local leadership and equity. A staged approach to evidence gathering, aligned with implementation maturity and grounded in local contexts, allows organizations to learn and adapt while building toward meaningful impact assessment. As the field evolves, evidence frameworks must balance rigorous evaluation with flexibility, ensuring that the process remains responsive to local needs and promotes equitable outcomes. Success lies not in rushing to demonstrate impact, but in building sustainable, locally-owned systems for learning and improvement.
We would love to hear your thoughst on this and your experience in building evidence (frameworks) for your AI appliations in development and humanitarian contexts. Reach out!