Generative AI has been at the forefront of the automation revolution, particularly since the emergence of ChatGPT. The emergence of applications such as chatbots, video makers, and co-pilots has helped companies automate processes across numerous industries, including software supply chains, marketing, and information security. However, the full potential of GenAI is yet to be unlocked. A significant reason for this is the cost implications. The cost of incorporating Large Language Models (LLMs) into your organization can range from a fixed price of $20 for a monthly license, to a few cents per request for on-demand use cases, and can even increase to millions of USD for hosting an AI application on your own cloud infrastructure. In this blog post, I will break down the costs of AI for organizations focusing on three domains:
Cost of AI SaaS solutions (ChatGPT, Gemini Pro, Co-Pilot etc)
Cloud costs of hosting or consuming AI models (OpenAI, Anthropic etc)
Project costs of building AI applications
Cost of tuning and training LLMs
LLMstudio cost and performance dashboards
Feature/Aspect | ChatGPT or Equivalent Subscription | API On-Demand | Hosting Your Own Models |
Cost | Fixed monthly fee (20-40 USD per user) | Pay per use (based on the number of tokens or API calls) | High initial costs for hardware and setup, ongoing operational costs |
Scalability | Limited to individual use with quota on the use rate | Highly scalable, managed by API provider | Depends on your infrastructure; may require additional investment for scaling up |
Ease of Use | High (just subscribe and use). Including supporting applications like chat, integration with office apps etc. | High (requires integration with API) | Low (requires setup, configuration, and maintenance) |
Maintenance | Handled by the provider | Handled by the API provider | Your responsibility (updates, security, hardware maintenance) |
Access to Updates | Automatic | Automatic | Manual (you need to update models and dependencies) |
Customization | Limited to service options | Limited to API capabilities | High (full control over the model and infrastructure) |
Data Privacy | Dependent on provider’s policy | Dependent on provider’s policy | High (data remains within your infrastructure) |
Speed/Response Time | Dependent on provider | Dependent on API and internet speed | Potentially faster (localized infrastructure) but depends on setup |
Internet Dependency | Yes | Yes | No (unless updating models or using cloud resources) |
AI Subscriptions
Companies like OpenAI and Google offer AI subscriptions, such as ChatGPT for individuals or teams, Gemini Pro, and GitHub Copilot. These are not "pure" AI models but rather applications that leverage AI technologies, including ChatGPT, image generation, or text editors embedded into Google Docs. These services are typically priced between $20 and $40, with limitations on usage so you cannot integrate them directly with your backend applications. However, users in your organization will still be able to copy and paste text that they need to edit or utilize AI for specific tasks, like writing code through development environments like VSCode. Use these services to enhance the productivity of your employees. However, these services usually do not allow you to build an offering on top of them. Imagine that you want to use AI on your website to help customers have a better experience. In these cases, you would want more fundamental access to AI models, such as LLMs. Let’s see how much these would cost you.
AI In production: Paying for using LLMs
Assuming you want to develop an AI application of your own and integrate it into your production backend, you will have two payment methods to consider:
Pay by Token: Companies pay based on the amount of data processed by the model service. Costs are determined by the number of tokens (derived from words or symbols) processed, both for inputs and outputs. Below, we refer to how OpenAI calculates tokens, for example.
Hosting Your Own Model: Companies host Large Language Models (LLMs) on their infrastructure, paying for the computing resources required to run these models, especially GPUs. They may also potentially pay a license fee for the LLM itself.
While the pay-by-token model offers simplicity and scalability, hosting your own model provides control over data privacy and operational flexibility. However, this approach demands significant investment in infrastructure and maintenance. Let's review the two options:
Hosting an LLM on your cloud
When it comes to hosting your own model, the main cost, as mentioned, would be hardware. Consider, for example, hosting an open-source Llama2 70B on AWS. The default instance recommended by AWS is ml.g5.48xlarge, with a listed price of almost 16.5 USD per hour (on-demand). This means that such a deployment would cost at least $ 11,700 USD per month, assuming it doesn't scale up or down and that no discounts are applied.
Scaling up and down of this AWS service may require attention, configuration changes, and optimization processes; however, the costs will still be very high for such deployments.
Paying per tokens
An alternative to hosting an LLM and paying for the hardware is to use SaaS models and pay per token. Tokens are the units vendors use to price calls to their APIs. Different vendors, like OpenAI and Anthropic, have different tokenization methods, and they charge varying prices per token based on whether it's an input token, output token, or related to the model size. In the following example, we demonstrate how OpenAI calculates a token count for a given text. It's evident that using special characters results in higher costs, while words in English consume fewer tokens. If you are using other languages, such as Hebrew, be aware that the costs may be even higher.
OpenAI tokenizer: demonstrates how characters are charged
Now that we established how you can pay for an LLM, let's discuss those costs that are often overlooked.
Hidden costs of LLM applications
GPT-For-Work has developed an OpenAI pricing calculator for GPT products. We utilized it to estimate the cost of an AI application that processes 5 requests per minute, and were immediately faced with the question: how many tokens will be sent in such a case? The answer is complex, as this number is influenced by several hidden and unknown factors:
The size of the user input and the generated output can vary significantly.
There are hidden costs associated with application prompts.
Utilizing agent libraries typically incurs additional API calls in the background to LLMs, in order to implement frameworks like ReAct or to summarize data for buffers.
These hidden costs are often the primary cause of bill shock when transitioning from the prototyping phase to production. Therefore, generating visibility into these costs is crucial.
Experimenting with various models and recording their associated costs. LLMstudio screen capture.
How to control the cost of LLMs?
Improving the underlying hardware is a key strategy for controlling the cost of LLMs. By investing in faster or more advanced GPUs, you can significantly increase the speed of inference, making your LLM applications run more efficiently. This improvement in performance can offset higher costs by reducing the time it takes to generate results, thereby balancing the trade-offs between speed, cost, and accuracy. The domain of optimizing AI costs in the cloud is called AI-FinOps, and there are advanced approaches to balancing the requirements and their associated costs in order to achieve an optimized deployment of AI applications. You can read more about LLMFinOps in our recent blog post to further dive into this domain, but let's give you the highlights of what LLMFinOps looks like.
Using an analytic approach
Take a look at the following chart that we introduced using LLMstudio's SDK:
The charts illustrate why optimization of Large Language Models (LLMs) is indeed a multi-dimensional problem. Switching between LLMs can result in variations in accuracy, latency, and cost in different directions. A straightforward approach to address this issue is to fix two dimensions by establishing the required performance metrics (for example, ensuring a response time of no more than 120ms with an x% accuracy score) and then selecting the most cost-effective alternative. While this solution might be easy to outline, its implementation is complex and may involve sub-dependencies.
Therefore, a practical solution involves adopting an analytical approach that allows for tracking various scenarios and testing them against your dataset. Tools like LLMstudio serve as a centralized platform for managing all your LLM interactions, facilitating the analytical approach necessary for resolving optimization challenges. LLMstudio enables the testing of different prompts with various LLMs from any provider, while logging the history of requests and responses for subsequent analysis. Moreover, it monitors key metrics related to cost and latency, empowering you to make informed decisions regarding the optimization of your LLM deployment.
Cost of AI Projects
The cost of AI projects can vary from $30,000 for a Proof of Concept (PoC) to $500,000 for more advanced stages. Let's examine how we arrived at these numbers. A typical AI project starts with a PoC phase. During this time, it is crucial to run through the critical phases of mapping out the idea, validating the business model, and building a prototype. This process typically takes 3-4 weeks, and a professional R&D team could achieve that with 1-2 engineers and a project manager. Assuming you are in Western Europe or the USA, the typical cost for an organization could be between $30K and $60K.
Moving past the PoC, developing a Minimum Viable Product (MVP) involves a slightly larger team and lasts 2-3 months, bringing the cost to between $90K and $120K USD. However, it's important to remember that AI projects require high maintenance, so assume that over the first year, there will be a significant investment in monitoring, bug fixing, and tuning.
Therefore, we conclude that the total cost of an initial AI project could be starting at $500K USD in its first year.
Summary of AI Project Costs:
Project Stage | Team Composition | Duration | Estimated Cost |
Proof of Concept | 1-2 Engineers, 1 Project Manager | 3-4 weeks | $30K - $60K |
Minimum Viable Product | 2 Engineers, 1 Project manager | 2-3 months | $90K - $120K |
Maintenance (First Year) | Ongoing support and development team | 1 year | $340K - $390K |
Total | First Year | >$500K |
This table provides a structured overview of the costs associated with different stages of an AI project.
Cost of Training LLMs
As of April 2024, HuggingFace has published about 16,000 models. However, when it comes to foundation models, there are only a few dozen, and maybe even fewer that are considered the gold standard. These include proprietary models like GPT-4 (by OpenAI) or open-source models like Gork (by X, formerly Twitter). These models can be tuned using prompt engineering with no "one-off cost"; you simply pay every time you make a call to the AI model, sending specific instructions to get the desired output. However, there are two things you can do to make a one-off investment in AI models: fine-tune or train your own. Fine-tuning would involve taking a few hundred samples, usually, to emphasize some previously learned behavior of the model. For example, instead of telling a customer service AI model to be more polite through prompt engineering, you can fine-tune it to be more polite. Since the model was already trained on massive data and knows what it means to be polite, it will easily adapt. The investment could range from a few hundred to thousands of USD, leading to more stable results and possibly even long-term cost reductions.
Training a model is another story. It’s meant to teach models new skills and embed new knowledge. For example, if you have proprietary data and you don't want to share it with cloud vendors, you can use a model architecture to train your model from scratch. Bloomberg, for example, did this with Bloomberg GPT, which reportedly cost them close to a million dollars.
Summary Table: Comparing Fine-Tuning and Training Models
Aspect | Fine-Tuning | Training a Model |
Purpose | Enhance specific behaviors in an existing model | Teach new skills and knowledge |
Data Requirements | Few hundred samples typically | Large datasets, often proprietary |
Cost | Hundreds to thousands of USD | Can be very high (e.g., close to a million USD for Bloomberg) |
Long-Term Benefits | Possible cost reduction in the long term | Full control over model and data privacy |
This table highlights the primary differences and considerations between fine-tuning and training AI models.
Summary
Allocating budgets for AI in your organization should start with the question: What do you want to achieve? Would you like to make your staff more efficient by leveraging AI SaaS solutions like ChatGPT for the marketing team or GitHub Copilot for the developers? Or are you looking to monetize AI by integrating an AI model from OpenAI or Meta with your own application and selling it as a service or product?
Finally, don't forget the costs of labor. Yes, in the long run, AI will reduce labor costs by increasing automation, but in the short-to-mid term, your organization will need to pay for AI expert services, change management, and engineering to adjust to the new way of operating. Consult experts like TensorOps to build an AI onboarding plan for your organization.
Comentários