Questions we should be asking about AI in our sector

When ideas from conversations and shared experiences converge and start to take shape, I like to bring it all together. Here are some thoughts and insights matured over the past few months and triggered from Brent O. Phillips interviewing me for the Humanitarian AI Today (Voices) podcast, conversations and insights from the NetHope Summit and a podcast by the Centre for Global Development (CGD).

Are we forcing organizations to become product companies?

Development Organizations and NPOs aren't product companies

In this recent CGD podcast the speakers present the same core tension we presented at NetHope: development/humanitarian organizations are suddenly expected to build AI products, but they fundamentally aren't product companies.

What happens when we apply project thinking to products?

The “project” paradigm transferred to AI

Temina from the Agency Fund points out that nonprofits are rushing into impact evaluations before achieving product stability. They can't show basic user funnels. They're "vibe checking" models instead of using actual evaluation tools like LangFuse or DeepEval.

This isn't surprising. Development and Humanitarian organizations have experience running projects . Time-bound interventions. Discrete programs. Anyone who’s been in the humanitarian and development sectors long enough knows this paradigm doesn't solve systemic challenges like poverty or education. Yet we're now grafting it onto AI product development.

At the recent NetHope Global Summit, Zineb Bhaby from NRC captured this disconnect: "Big tech solutions keep arriving without understanding how digitally immature humanitarian organizations are. When you don't have basic activity tracking systems or a safe way to handle people's data, what does AI really solve?"

She's right. Organizations lack the data foundations, the technical infrastructure, the product development muscle. But the sector keeps framing this as an "AI adoption" question when it's actually a fundamental capacity gap.

Who actually owns the AI we're building?

We are building a Big Tech dependency

Han from CGD notes that the relationship between major tech companies and local actors remains undefined. Most foundation models come from the Global North, trained in English. Organizations default to large language models because that's what's available, marketed and what seems accessible.

At NetHope Luca from Greenpeace candidly and factually reminded us, at this great session, of the broader and very high risk of big tech dependancy (cloud systems and data sovereignty).

But we're at a pivot point with frugal AI, small language models, fine-tuned models, and domain-specific tools. Temina references how Digital Green needs models that understand rust in wheat, not just rust generally (i.e. semantic understanding). This is exactly the domain expertise that should live in modular, composable AI components that organizations can share, own and build on. Lindsey from DevelopMetrics has been using classic NLP for better domain understanding for a while now, well before GPTs.

My push : instead of every nonprofit building complete products from scratch using rented APIs with OpenAI’s, Google’s, Salesforce’s or AWS’ “generous credits”, we need agricultural experts, health workers, and educators working with small tech to create building blocks. Fine-tuned open-source models. Custom datasets. Domain-specific classifiers. Low-resource language support. Ethical evaluation frameworks. This is AI sovereignty in practice.

Mitali Ayyangar from DataKind raised a similar point at NetHope: organizations need to move beyond chatbots to ground AI in real program needs, focusing on enabling access to complex analyses that have historically required high technical barriers.

What are we really investing in?

The ethical contradictions

Temina asks whether it's ethical for companies to push Global South governments to build data centers when clinics lack electricity for safe childbirth. This is AI washing at scale. Investment flowing into AI infrastructure while basic service delivery remains broken.

At multiple humanitarian conferences, Suzy Madigan observes that conversation remains concentrated on "AI as the solution to reaching more people with less money, rather than understanding what is practically required to procure, design, deploy and monitor AI more responsibly and inclusively."

The accelerator program Han and Temina describe revealed this gap. Organizations jumping to sophisticated implementations sometimes without solving fundamental questions: Who uses this? How often? Where do they drop off? What does meaningful engagement look like?

I have heard from private conversations with humanitarian organization colleagues who are “forced” to use Microsoft CoPilot as their CIO “AI-ifies” their organizations. They struggle with the ethical dilemma of using AI to help with their work from the same companies that provide cloud compute to state actors that cause conflicts and kill children.

(What) are we even evaluating?

The AI evaluation problem

Han makes a critical point: there's no singular AI. Systems change constantly. Models update, prompts refine, parameters adjust. What exactly are we evaluating? The LLM? The AI product? The development outcomes?

At the product/flow level the evaluation tools exist, e.g., LangFuse, LangSmith, DeepEval, etc. Model observability platforms. But they require technical expertise that development and non-profit organizations don't have and shouldn't be expected to build alone. And of all this is assuming the tooling and application of AI was sound and efficient and not just another ChatGPT wrapper with a WhatsApp plugin.

This is why partnerships between domain experts and small tech matter. Not consultants who parachute in and leave. Not Big Tech in-kind expertise that don’t understand our sector. Collaborative relationships where subject matter experts define quality and technical teams build reproducible datasets, models, approaches and evaluation frameworks that organizations can share, use and replicate.

Karissa Dunbar noted at NetHope the importance of bringing IT teams into design early. Alicia Morrison emphasized grounding AI in actual program needs. The humanitarian rethink is maybe also on how to up-skill teams to merge tech and domain expertise.

So what does owning our AI actually look like?

My take from staying tuned at the edge of AI

The technology exists for organizations to own their AI. Small language models are evolving. Fine-tuning is more accessible. Low-resource language support is improving. Building costs 100x less than it did 2 years ago.

What's missing:

Modular, composable components rather than complete products built from scratch
Domain-specific datasets and models that organizations build together
Shared evaluation frameworks that work across similar contexts
Technical partnerships that respect domain expertise and aren’t trying to increase the corporate bottom line
Realistic expectations about what AI solves versus what foundational capacity is required

We have a choice about what AI infrastructure gets built for humanitarian and development impact. Keep rushing toward impact evaluations of unstable products on rented technology. Or build foundations right, with ownership, proper evaluation, and product capacity where it actually belongs. This starts with bringing humans together.

Reach out if any of this resonates with you!