Opinion, AI for Knowledge Extraction  |  August 3, 2023

Knowledge Platforms - Addressing Bias, Embracing Diversity, and Capturing Tacit Wisdom

Introduction

In an era where knowledge equates to power, the management and dissemination of knowledge play a vital role in shaping the discourse in various fields, including the humanitarian and development sectors. The advent of large language models and conversational AI systems presents both opportunities and challenges for knowledge management across industries. The WASH sector in particular grapples with longstanding issues of knowledge equity and localization that new AI systems could potentially perpetuate or help mitigate. This article examines the limitations of existing knowledge platforms, the risks of bias in AI-augmented systems, and strategies to make knowledge management more inclusive, less biased, and more reflective of on-the-ground expertise.

Key Challenges in Conventional Knowledge Bases

Conventional knowledge bases in the humanitarian and international development sectors are essential repositories mainly comprised of “Northern” expertise. While valuable, they suffer from key limitations including the exclusion of diverse perspectives, overlooking of undocumented tacit knowledge, reinforcement of established views, static knowledge, and lack of contextualization. These challenges hinder their effectiveness and inclusivity, and may lead to the perpetuation of ineffective or inappropriate practices.

Missing Perspectives and Epistemic Injustice

The voices of local practitioners, community members, and non-elite viewpoints are largely absent from conventional knowledge platforms. This skews the discourse toward Western paradigms. Traditional knowledge bases such as libraries and digital repositories often suffer from a lack of diversity in perspectives. They typically comprise reports written by Northern experts, making them susceptible to a form of neo-colonization of knowledge. While digitalization has increased accessibility to knowledge for practitioners in low-resource settings, platform content remains dominated by the Global North. Addressing this imbalance requires not just broader participation, but also new protocols for assessing the value of different types of knowledge.

Undocumented Tacit Knowledge

A substantial amount of valuable knowledge resides in the hands-on experience of practitioners. This "tacit knowledge" often remains undocumented and is overlooked by conventional knowledge bases. The insight of those who are more directly involved in challenges, either working daily on the issues or those within the actual households, schools, clinics, and other places suffering from the lack of resources or social injustice, can be a vital resource. Despite its importance, capturing this tacit knowledge is challenging because it is often based on personal experience and intuition, making it difficult to articulate or record. This type of knowledge is frequently not recognized as valid or valuable by those who drive the paradigms of development. They tend to prioritize formal, documented knowledge, missing out on the nuanced understanding that those directly engaged in the issues can offer, thereby potentially limiting the effectiveness of interventions.

Reinforcement of Established Views

By aggregating the work of a limited pool of so called experts, these platforms tend to reinforce established orthodoxies and mainstream opinions rather than capturing alternative perspectives. The existing knowledge platforms might prioritize widely accepted theories and practices, resulting in an inadvertent overlooking of minority opinions. This can lead to information bias, where non-mainstream insights that may be valuable are undervalued or completely ignored. Furthermore, the conventional approach may promote and reinforce popular but ineffective or even harmful practices, creating a feedback loop that strengthens existing beliefs without critical evaluation.

Static Knowledge

Conventional databases and libraries, forming the bedrock of traditional knowledge systems, are often confined to fixed sets of information, with limitations in adaptability and evolution. Unlike novel systems that include dynamic content generation, these static repositories lack real-time monitoring and updating. When new evidence or insights emerge, the information might become outdated, leading to incorrect or obsolete conclusions. There's a stark absence of channels for feedback and interaction within the system, rendering the knowledge inert rather than responsive. This contrasts sharply with more innovative approaches where information is continually processed, integrated, and refreshed, promoting a living, adaptable understanding of subjects that keeps pace with the rapid changes in various subjects.

Lack of Contextualization

The inability to adapt to specific local circumstances is a significant flaw in many conventional knowledge bases. While they might provide generalized recommendations and best practices, these often overlook the unique environmental, economical, political, and socio-cultural dynamics that greatly influence the success or failure of initiatives. Without considering these vital elements, the information might lead to misapplied strategies and suboptimal outcomes. On the other hand, newer methods of knowledge management take into account the ever-changing landscape of local human interaction and cultural nuances. By including dynamic content that adapts to different contexts, these systems offer more tailored and effective solutions.

Limitations of simplistic Augmented Large Language Models

The rise of large language models (LLMs), like GPT-3 & 4 by OpenAI and Claude by Anthropic, has enabled rapid creation of AI assistants that can generate lengthy summaries, translate documents, answer complex questions, and synthesize insights across thousands of sources. As these AI systems are deployed for knowledge management, they carry both promise and peril:

Non-Augmented Generative LLMs

The "knowledge" or "memory" of Interactive Generative Large Language Models (LLMs), such as ChatGPT, is confined to the data on which they were trained. This data limitation means that not all information is accessible during inference (e.g., during text generation, a model trained on 40 TB of data cannot accurately reproduce the entire 40TB; it must generalize or even infer details). Moreover, the base knowledge is constrained by the cutoff date of the training data, which for GPT-3.5 is September 2021 and GPT-4 is 2022. Information published after this date is not part of the models' "internal" memory. While re-training and continual fine-tuning with new data are possible, they can be highly expensive and complex, and in some cases, may even lead to model degradation. Other challenges of simple LLMs include:

Reinforcing Dominant Paradigms Like any machine learning tool, large language models reflect the biases inherent in their training data. If this data over-represents Western paradigms, AI risks amplifying those limited perspectives.

Devaluing Minority Opinions Majority views will be reflected across more training documents, while dissenting opinions will be fewer and less influential in shaping model outputs. AI could thereby silence minority voices.

False Precision Lengthy, highly fluent responses could imbue AI outputs with undue authority, leading users to trust the information without deeper scrutiny.

Flawed Sourcing Given the black-box nature of model training, it may be unclear which (or how many) sources an AI summary is drawing from, making it hard to ascertain legitimacy.

Potential Pitfalls of Simplicity

While using a simple LLM without augmented retrieval might seem appealing, it can inherently miss out on tacit knowledge, undocumented experience, and minority opinions. This lack of sophistication can inadvertently promote misconceptions by relying solely on formal documents. Considerable work remains to develop participatory protocols, interactive interfaces, and customized algorithms to realize a more nuanced and accurate knowledge capture.

Opportunities and Challenges of Using Augmented LLMs

Choice of Source and Base Knowledge

Selecting key documents and information to form an AI model's augmented knowledge can shift the bias away from core trained knowledge. This includes giving weight to diverse opinions and practices from different cultures and sub-national actors. To prevent biases and misunderstandings, human-in-the-loop processes are essential to curate the knowledge sources that underpin AI assistants. This can also allow models to be tuned to boost minority opinions and local voices, thus fostering a more inclusive understanding.

Capturing Unstructured and Tacit Knowledge

AI can be used to capture the experiences of practitioners sharing unstructured knowledge, such as sanitation workers sharing their experiences in webinars or recorded sessions. The challenge here is to ensure that key lessons embedded in the experiences of local practitioners, policymakers, and community members are not lost. Techniques like clustering and semantic search across conversations can help surface key themes from unstructured feedback, integrating these unwritten perspectives and combating epistemic injustice.

Conversational Capabilities and Interfaces

AI can capture user's experience more effectively through conversational interaction in the native language, even accommodating non-elite and non-Western sources like community radio, oral histories, and interviews. Chatbots and voice-based interactions allow local experts to share knowledge in their language, but user interfaces must indicate the provenance and limitations of AI-generated information to ensure transparency.

Synthesizing Insights and Weighted Recommendations

Tools like sentiment analysis, upvoting, and even commentary can gather reactions from users, while ratings and reviews prioritize knowledge sources that users consistently validate as useful. However, it's essential to be aware of the risk that an AI-driven platform might amplify existing biases by reinforcing received wisdom without critical analysis. Thoughtful implementation and continual review of automated outputs can help correct, not compound, the biases, foregrounding local expertise and challenging unexamined assumptions.

Empowering Practitioners and Ensuring Knowledge Equity

A collaborative approach that empowers practitioners as active contributors is vital. Their information needs should guide not just what knowledge is presented, but how it is produced, validated, and updated. Knowledge equity must be a keystone, involving all stakeholders in guiding the continued evolution of socio-technical systems.

Conclusion

The humanitarian and development sectors are at a pivotal moment, as the traditional constraints of knowledge management are becoming evident, and the potential for revolutionizing this narrative with AI is emerging. A well-considered design and rollout of generative AI platforms must be undertaken to make them impartial, all-encompassing, and truly reflective of the diverse expertise in the field.

Emphasizing local perspectives, respecting minority viewpoints, and using AI to harness even unspoken wisdom enables the sector to establish a resilient, comprehensive, fair, and balanced knowledge environment. Using AI and augmented LLMs symbolizes more than mere technological advancement; it embodies a shift towards a more empathetic and discerning appreciation of knowledge.


Inline with the recommendations above, we are building WASH AI. We see the integration of AI in knowledge management in the WASH sector as pathway to results that are more localized, nuanced, inclusive, and efficient. This initiative calls for a precise blend of technological prowess and human understanding, along with ongoing assessment, to make certain that the methods employed are both inclusive and ethical. For more information on WASH AI, read this article.