top of page
Writer's pictureGad Benram

Cost of AI - What Would an Organization Pay in 2024?

Updated: Jul 19


Generative AI has been at the forefront of the automation revolution, particularly since the emergence of ChatGPT. The emergence of applications such as chatbots, video makers, and co-pilots has helped companies automate processes across numerous industries, including software supply chains, marketing, and information security. However, the full potential of GenAI is yet to be unlocked. A significant reason for this is the cost implications. The cost of incorporating Large Language Models (LLMs) into your organization can range from a fixed price of $20 for a monthly license, to a few cents per request for on-demand use cases, and can even increase to millions of USD for hosting an AI application on your own cloud infrastructure. In this blog post, I will break down the costs of AI for organizations focusing on three domains:

  1. Cost of AI SaaS solutions (ChatGPT, Gemini Pro, Co-Pilot etc)

  2. Cloud costs of hosting or consuming AI models (OpenAI, Anthropic etc)

  3. Project costs of building AI applications

  4. Cost of tuning and training LLMs


LLMstudio cost and performance dashboards


Feature/Aspect

ChatGPT or Equivalent Subscription

API On-Demand

Hosting Your Own Models

Cost

Fixed monthly fee (20-40 USD per user)

Pay per use (based on the number of tokens or API calls)

High initial costs for hardware and setup, ongoing operational costs

Scalability

Limited to individual use with quota on the use rate

Highly scalable, managed by API provider

Depends on your infrastructure; may require additional investment for scaling up

Ease of Use

High (just subscribe and use). Including supporting applications like chat, integration with office apps etc.

High (requires integration with API)

Low (requires setup, configuration, and maintenance)

Maintenance

Handled by the provider

Handled by the API provider

Your responsibility (updates, security, hardware maintenance)

Access to Updates

Automatic

Automatic

Manual (you need to update models and dependencies)

Customization

Limited to service options

Limited to API capabilities

High (full control over the model and infrastructure)

Data Privacy

Dependent on provider’s policy

Dependent on provider’s policy

High (data remains within your infrastructure)

Speed/Response Time

Dependent on provider

Dependent on API and internet speed

Potentially faster (localized infrastructure) but depends on setup

Internet Dependency

Yes

Yes

No (unless updating models or using cloud resources)

AI Subscriptions

Companies like OpenAI and Google offer AI subscriptions, such as ChatGPT for individuals or teams, Gemini Pro, and GitHub Copilot. These are not "pure" AI models but rather applications that leverage AI technologies, including ChatGPT, image generation, or text editors embedded into Google Docs. These services are typically priced between $20 and $40, with limitations on usage so you cannot integrate them directly with your backend applications. However, users in your organization will still be able to copy and paste text that they need to edit or utilize AI for specific tasks, like writing code through development environments like VSCode. Use these services to enhance the productivity of your employees. However, these services usually do not allow you to build an offering on top of them. Imagine that you want to use AI on your website to help customers have a better experience. In these cases, you would want more fundamental access to AI models, such as LLMs. Let’s see how much these would cost you.


AI In production: Paying for using LLMs

Assuming you want to develop an AI application of your own and integrate it into your production backend, you will have two payment methods to consider:

  1. Pay by Token: Companies pay based on the amount of data processed by the model service. Costs are determined by the number of tokens (derived from words or symbols) processed, both for inputs and outputs. Below, we refer to how OpenAI calculates tokens, for example.

  2. Hosting Your Own Model: Companies host Large Language Models (LLMs) on their infrastructure, paying for the computing resources required to run these models, especially GPUs. They may also potentially pay a license fee for the LLM itself.

While the pay-by-token model offers simplicity and scalability, hosting your own model provides control over data privacy and operational flexibility. However, this approach demands significant investment in infrastructure and maintenance. Let's review the two options:


Hosting an LLM on your cloud

When it comes to hosting your own model, the main cost, as mentioned, would be hardware. Consider, for example, hosting an open-source Llama2 70B on AWS. The default instance recommended by AWS is ml.g5.48xlarge, with a listed price of almost 16.5 USD per hour (on-demand). This means that such a deployment would cost at least $ 11,700 USD per month, assuming it doesn't scale up or down and that no discounts are applied.



Scaling up and down of this AWS service may require attention, configuration changes, and optimization processes; however, the costs will still be very high for such deployments.


Paying per tokens

An alternative to hosting an LLM and paying for the hardware is to use SaaS models and pay per token. Tokens are the units vendors use to price calls to their APIs. Different vendors, like OpenAI and Anthropic, have different tokenization methods, and they charge varying prices per token based on whether it's an input token, output token, or related to the model size. In the following example, we demonstrate how OpenAI calculates a token count for a given text. It's evident that using special characters results in higher costs, while words in English consume fewer tokens. If you are using other languages, such as Hebrew, be aware that the costs may be even higher.


OpenAI tokenizer: demonstrates how characters are charged


Now that we established how you can pay for an LLM, let's discuss those costs that are often overlooked.


Hidden costs of LLM applications

GPT-For-Work has developed an OpenAI pricing calculator for GPT products. We utilized it to estimate the cost of an AI application that processes 5 requests per minute, and were immediately faced with the question: how many tokens will be sent in such a case? The answer is complex, as this number is influenced by several hidden and unknown factors:

  • The size of the user input and the generated output can vary significantly.

  • There are hidden costs associated with application prompts.

  • Utilizing agent libraries typically incurs additional API calls in the background to LLMs, in order to implement frameworks like ReAct or to summarize data for buffers.

These hidden costs are often the primary cause of bill shock when transitioning from the prototyping phase to production. Therefore, generating visibility into these costs is crucial.


Experimenting with various models and recording their associated costs. LLMstudio screen capture.



How to control the cost of LLMs?

Improving the underlying hardware is a key strategy for controlling the cost of LLMs. By investing in faster or more advanced GPUs, you can significantly increase the speed of inference, making your LLM applications run more efficiently. This improvement in performance can offset higher costs by reducing the time it takes to generate results, thereby balancing the trade-offs between speed, cost, and accuracy. The domain of optimizing AI costs in the cloud is called AI-FinOps, and there are advanced approaches to balancing the requirements and their associated costs in order to achieve an optimized deployment of AI applications. You can read more about LLMFinOps in our recent blog post to further dive into this domain, but let's give you the highlights of what LLMFinOps looks like.


Using an analytic approach

Take a look at the following chart that we introduced using LLMstudio's SDK:



The charts illustrate why optimization of Large Language Models (LLMs) is indeed a multi-dimensional problem. Switching between LLMs can result in variations in accuracy, latency, and cost in different directions. A straightforward approach to address this issue is to fix two dimensions by establishing the required performance metrics (for example, ensuring a response time of no more than 120ms with an x% accuracy score) and then selecting the most cost-effective alternative. While this solution might be easy to outline, its implementation is complex and may involve sub-dependencies.


Therefore, a practical solution involves adopting an analytical approach that allows for tracking various scenarios and testing them against your dataset. Tools like LLMstudio serve as a centralized platform for managing all your LLM interactions, facilitating the analytical approach necessary for resolving optimization challenges. LLMstudio enables the testing of different prompts with various LLMs from any provider, while logging the history of requests and responses for subsequent analysis. Moreover, it monitors key metrics related to cost and latency, empowering you to make informed decisions regarding the optimization of your LLM deployment.


Cost of AI Projects

The cost of AI projects can vary from $30,000 for a Proof of Concept (PoC) to $500,000 for more advanced stages. Let's examine how we arrived at these numbers. A typical AI project starts with a PoC phase. During this time, it is crucial to run through the critical phases of mapping out the idea, validating the business model, and building a prototype. This process typically takes 3-4 weeks, and a professional R&D team could achieve that with 1-2 engineers and a project manager. Assuming you are in Western Europe or the USA, the typical cost for an organization could be between $30K and $60K.

Moving past the PoC, developing a Minimum Viable Product (MVP) involves a slightly larger team and lasts 2-3 months, bringing the cost to between $90K and $120K USD. However, it's important to remember that AI projects require high maintenance, so assume that over the first year, there will be a significant investment in monitoring, bug fixing, and tuning.

Therefore, we conclude that the total cost of an initial AI project could be starting at $500K USD in its first year.


Summary of AI Project Costs:

Project Stage

Team Composition

Duration

Estimated Cost

Proof of Concept

1-2 Engineers, 1 Project Manager

3-4 weeks

$30K - $60K

Minimum Viable Product

2 Engineers, 1 Project manager

2-3 months

$90K - $120K

Maintenance (First Year)

Ongoing support and development team

1 year

$340K - $390K

Total


First Year

>$500K

This table provides a structured overview of the costs associated with different stages of an AI project.


Cost of Training LLMs

As of April 2024, HuggingFace has published about 16,000 models. However, when it comes to foundation models, there are only a few dozen, and maybe even fewer that are considered the gold standard. These include proprietary models like GPT-4 (by OpenAI) or open-source models like Gork (by X, formerly Twitter). These models can be tuned using prompt engineering with no "one-off cost"; you simply pay every time you make a call to the AI model, sending specific instructions to get the desired output. However, there are two things you can do to make a one-off investment in AI models: fine-tune or train your own. Fine-tuning would involve taking a few hundred samples, usually, to emphasize some previously learned behavior of the model. For example, instead of telling a customer service AI model to be more polite through prompt engineering, you can fine-tune it to be more polite. Since the model was already trained on massive data and knows what it means to be polite, it will easily adapt. The investment could range from a few hundred to thousands of USD, leading to more stable results and possibly even long-term cost reductions.



Training a model is another story. It’s meant to teach models new skills and embed new knowledge. For example, if you have proprietary data and you don't want to share it with cloud vendors, you can use a model architecture to train your model from scratch. Bloomberg, for example, did this with Bloomberg GPT, which reportedly cost them close to a million dollars.

Summary Table: Comparing Fine-Tuning and Training Models

Aspect

Fine-Tuning

Training a Model

Purpose

Enhance specific behaviors in an existing model

Teach new skills and knowledge

Data Requirements

Few hundred samples typically

Large datasets, often proprietary

Cost

Hundreds to thousands of USD

Can be very high (e.g., close to a million USD for Bloomberg)

Long-Term Benefits

Possible cost reduction in the long term

Full control over model and data privacy

This table highlights the primary differences and considerations between fine-tuning and training AI models.


Summary

Allocating budgets for AI in your organization should start with the question: What do you want to achieve? Would you like to make your staff more efficient by leveraging AI SaaS solutions like ChatGPT for the marketing team or GitHub Copilot for the developers? Or are you looking to monetize AI by integrating an AI model from OpenAI or Meta with your own application and selling it as a service or product?

Finally, don't forget the costs of labor. Yes, in the long run, AI will reduce labor costs by increasing automation, but in the short-to-mid term, your organization will need to pay for AI expert services, change management, and engineering to adjust to the new way of operating. Consult experts like TensorOps to build an AI onboarding plan for your organization.



Comentários


Sign up to get updates when we release another amazing article

Thanks for subscribing!

bottom of page