In the rapidly evolving world of AI, Large Language Models (LLMs) have become a cornerstone for many organizations. However, the cost associated with using these models can be a significant barrier, especially for high-throughput applications. This blog post will explore strategies to reduce these costs, using insights from a recent research paper [1].
LLMs, such as GPT-4, ChatGPT, and J1-Jumbo, have diverse pricing structures, with fees that can differ by two orders of magnitude. For example, using LLMs on large collections of queries and text can be expensive. The cost of using a LLM API typically consists of three components: prompt cost (proportional to the length of the prompt), generation cost (proportional to the generation length), and sometimes a fixed cost per query.
The research paper outlines three strategies that users can exploit to reduce the inference cost associated with using LLMs:
To illustrate these strategies, the researchers propose FrugalGPT, a simple yet flexible instantiation of LLM cascade. FrugalGPT learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. The experiments show that FrugalGPT can match the performance of the best individual LLM (e.g., GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost.
The strategies outlined in this blog post provide a foundation for using LLMs sustainably and efficiently. By implementing these strategies, organizations can leverage the power of AI in a cost-effective manner, thereby enhancing their work without breaking the bank.
Reference: [1] Reducing Costs Associated with LLM Use When Building AI Powered Applications. ArXiv, 2023.