top of page

Why Should News Organizations (Not) Build an LLM?

  • Writer: Layla
    Layla
  • Apr 25, 2024
  • 6 min read

Updated: May 17, 2024


Why_Should_News_Organizations

Integrating Large Language Models (LLMs) into the newsroom has the potential to unlock a myriad of opportunities for news organizations in tasks relevant to content creation and editing, as well as news gathering and distribution.


But as newsrooms continue to explore the avenues and prospects for harnessing LLMs, a question arises around the strategic and competitive use of the technology: should news organizations strive to train their own LLMs?


In this post I argue that news organizations (especially those with limited resources) that use prompt engineering, fine-tuning, and retrieval augmented generation (RAG) to enhance their productivity and offerings will be strategically better off than if they train their own LLMs from scratch.


I will first lay out the cost calculus for engineering and deploying your own model, and then I’ll elaborate the benefits and trade-offs of these other techniques for leveraging the value from generative AI.

Training LLMs from scratch could be a costly decision

Building and training LLMs from scratch is challenging due to the need for large datasets, extensive computing resources, and specialized talent to develop and train these models.


For instance, the computing resources needed to train BloombergGPT — an LLM for finance — is estimated to have cost approximately $1M. While the cost of serving such a model is not public information, the infrastructure required to serve even such a moderate-size model, irrespective of the number of users, is probably not cheap, and is likely somewhere in the six figures. In addition, the ethical considerations around building a responsible model, such as ensuring fairness, privacy, and transparency, while sourcing ethical data, require dedicated attention and resources that could divert the focus of news organizations from their core journalistic tasks and reporting.


Optimizing an LLM’s performance with respect to the amount of compute resources required for training remains a work in progress.


Rushing forward to train an LLM without a detailed cost-benefit analysis is likely to end up costing news organizations hefty amounts of money that may not yield a high return on investment. Since the development of BloombergGPT, in March 2023, smaller and more capable open source model architectures (such as Mistral Models) are becoming publicly available which present a competitive alternative to other large, costly, and proprietary models.


News organizations may instead want to consider the strategic advantages of utilizing third-party models that are accessible via API endpoints. This can reduce infrastructure costs while also ensuring access to state-of-the-art models, and even facilitate versatility in terms of quickly swapping models.


For instance, news organizations could deploy the open-source and quite performant Mistral-7B model via the HuggingFace Inference Endpoint on a single A10G GPU for $1.3 per hour for experimentation purposes. They could then decide to switch to Gemma-7B from Google at no cost while still paying the same amount for compute, allowing for rapid iteration and testing of different models.


Accordingly, News organizations exploring the potentials of prompt engineering, fine-tuning, and retrieval-augmented generation (RAG) may have a competitive cost advantage and application development agility, possibly achieving a faster return on investment through the use of readily available models (e.g., GPT-4 or Claude) via API or inference endpoints (e.g., Mistral-7B deployed via HuggingFace Inference Endpoint) for their applications.


What is prompt engineering?

Prompt engineering is an emerging communication technique between users and LLMs that is used to craft questions and instructions to elicit a desired response from LLMs.


While prompt engineering appears to be straightforward on the surface, it requires domain expertise in different prompting techniques to fully reap the benefits of LLMs. For instance, this guide lists 17 different approaches to prompting, some of which are rather structured and involved.


And different models may require different prompt formats or tricks to get the best performance. Yet prompt engineering is still the fastest way to get information from a general purpose LLM (at least, one that is already tuned to behave like a chat assistant similar to ChatGPT) without modifying its architecture or retraining it.


You can refer to the Introduction to prompt design documentation guide to learn about how to create prompts that elicit the desired output from Google’s LLMs. OpenAI offers a similar prompting guide to use its models effectively. Or, Journalist’s ToolBox offers useful prompting resources that are more oriented towards use cases in journalism.


What is Retrieval Augmented Generation (RAG)?

While prompt engineering is a very powerful and resource efficient way to generate desired content, the knowledge for many LLMs is capped by the cut-off date of their training data. For example, GPT-4 has a cut off training date of December 2023.


In other words, without merging GPT-4’s knowledge with the information available online, the model won’t be able to infer the latest updates in the world. News organizations can build their own cost-efficient RAG systems using externally hosted LLMs (such as GPT-4 or Claude) or internally hosted open source models (e.g., Mistral-7B) to enable journalists and users to sift through and converse with a large corpus of archival documents, knowledge bases, or reporting material similar to the Financial Times AI chatbot. RAG services can also be multi-modal. Using multi-modal open source vector databases such as Weaviate, users can query and retrieve audio, video, and text data in natural language.


Overall, RAG allows LLMs to access real-time information (via connecting and retrieving information from the internet) or domain-specific knowledge (e.g., archival data) from a specific set of sources.


This capability can potentially enable journalists to generate answers to questions that are grounded in a curated set of factual and up-to-date information, enhancing the accuracy of the LLM.


When to fine-tune an LLM?

News organizations interested in specializing pre-trained LLMs (e.g., GPT-3.5) for specific applications and tasks, such as reflecting a specific writing style in generated text, should consider fine-tuning. Fine-tuning LLMs eliminates the need for constant prompt engineering to get the desired output by instead curating a dataset that closely mirrors the nature of a specific task.


For instance, perhaps your organization has a specific style or tone used in headlines that you would like a model to be able to mimic. By curating a dataset of articles and headlines you could then fine-tuning a model to be able to produce such headlines without requiring a user to know how to prompt the model in any specific ways.


While fine-tuning offers the advantage of a model with specialized and tailored responses and it reduces the prompting expertise and knowledge burden for end-users, the process of fine-tuning could potentially be expensive depending on the compute and data resources that are required for a particular task. However, it will surely be a much cheaper option than training an LLM from scratch.


Fine-tuning LLMs also requires constant monitoring and qualitative evaluation for potential model drifts.

OpenAI offers a great guide that walks through the necessary steps of customizing LLMs for your application. In addition, cloud service providers such as Google and Amazon also offer users the ability to fine-tune LLMs via their platforms Vertex AI and Bedrock, respectively.


Which method to pick from?

Prompt Engineering offers rapid adaptability to tasks in the newsroom with low computational overhead and technical complexity. It requires human expertise to craft prompts but does not require compute resources for inference especially for model endpoints offered by model providers such as OpenAI and Anthropic.


Retrieval Augmented Generation (RAG) extends an LLMs’ capacity to incorporate real-time or external data for more factual responses. Although RAG does not require training or fine-tuning LLMs, storing knowledge bases from which the LLMs are fetching information may incur some cost as the knowledge base increases in size.


Fine-tuning, on the other hand, provides high specialization for task-specific responses, requires a careful selection of the dataset for fine-tuning, and involves moderate computational and technical resources.

In Closing

Based on these various factors I would generally recommend for news organizations to explore the use case in which LLMs can be integrated to their workflows through prompting and possibly fine-tuning third-party models for their tasks. This is often going to be preferable to grappling with expensive infrastructure costs needed to train and deploy models that can potentially be outdated and less efficient in the near future. Until the infrastructure costs become cheaper and training LLMs become more accessible, I would not recommend for news organizations to build their own LLMs.

Comentários


bottom of page