Updated: Aug 13
Co-written with Gad Benram
The sophistication of large language models, like Google's PaLM-2, has redefined the landscape of natural language processing (NLP). These models' ability to generate human-like text has opened up a vast array of applications, including virtual assistants, content generation, and more. To truly leverage these models' potential, an efficient approach is needed: Prompt Engineering. This blog post aims to elucidate key design patterns in prompt engineering, complete with real-world examples to help developers extract maximum benefit from these language models.
Prompts are natural language requests that you give to a language model to generate a particular response. These prompts can be questions, tasks, contextual information, or examples. The model's output depends heavily on the type and structure of the prompt. Effective prompt engineering, therefore, revolves around crafting well-designed prompts that steer the model towards your desired outcome.
How to explain a task to a Large Language Model:
Prompt engineering is a fairly young field that is still under heavy research and where new approaches are being constantly tested, however, in the last few years, a few common patterns have emerged that are important to fully understand in order to get the most out of the latest and greatest Large Language Models.
Whithin these approaches to prompting there are 2 very well defined philosophies that are important to distinguish: Zero-Shot and Few-Shot prompting. These approaches differ in the way they condition the LLM's to think. Zero-Shot prompts try to explain to a model the task to be performed, while Few-Shot tends to provide shallower explanations, while providing examples of the task being performed instead.
When the task we are asking our Large Language Model to perform is quite easy to define with natural language, we can simply ask the model to perform this task without adding extra context.
With this generic description of tasks we are essentially trusting on the general intelligence of the model being used. For example, when asking an LLM to summarize a document, we are trusting in the concept of summary that the model learned during its training process. Due to this, after we've provided clear instructions on how to perform a task, the only real way to improve the performance of these types of tasks is to either fine-tune the model or provide some examples of how the task should be performed.
To provide some more context on how a certain task should be performed or how the output of said task should look, a very common technique is to provide the model with a few examples of already solved tasks.
This type of prompting allows us to explain more complex tasks to the models while not having to fine-tune it to be able to perform them. Because of the flexibility provided by this technique, it tends to be an extremely popular way to quickly get good results from LLM's.
Making LLM's think like us (and maybe better)
When LLM's try to solve more complex mathematical or logical problems, no matter how big or well trained, they tend to be greedy in the way they answer things and can very easily be fooled to provide a wrong response even though the question was well defined and explained:
Q: I have 10 apples. If I give my dad 5 apples, he eats one and gives the rest to his only son. How many apples do I have?
A: You start with 10 apples. When you give your dad 5 apples, you have 5 apples remaining. Your dad eats one, so you now have 4 apples. Your dad gives the rest (3 apples) to his only son, which is you. So, after this process, you have received 3 apples from your dad. Therefore, you now have a total of 7 apples.
When shown these kinds of examples one might jump to the conclusion that LLM's don't have the ability to solve these kinds of problems, which is true as these models are simply predicting the next most likely word iteratively. However, it turns out that, with creative prompting there are ways to achieve something very similar to human reasoning with LLM's.
To make an LLM "think", we must make its thought process be part of its output, in essence, we have to assume no inner reasoning capabilities and force all thoughts to be written. This way, since these models clearly have the ability to write coherent text, they will also now have the ability to write coherent thoughts and their answers will directly derive from them.
This kind of prompting is called Chain-of-Thought and it can be either done in a few-shot or zero-shot fashion. To understand how, let's look at a few examples:
I have 10 apples. If I give my dad 5 apples, he eats one and gives the rest to his only son. How many apples do I have? Let's think step by step
Let's break down the scenario step by step: You start with 10 apples. You give your dad 5 apples. Your dad eats one apple, so he has 4 left. Your dad gives the remaining 4 apples to his only son (which is you). So, you receive 4 apples back from your dad. Now, let's add up the apples: You originally had 10 apples. You gave 5 apples to your dad, which leaves you with 10 - 5 = 5 apples. You received 4 apples back from your dad, so you have 5 + 4 = 9 apples. Therefore, after giving your dad 5 apples, receiving 1 apple back, and considering your initial count, you have 9 apples.
Q: I have 100 apples. If I give my brother 3 apples, he eats one and gives the rest to his only brother. How many apples do I have? A: You had 100 apples and gave 3 to your brother, leaving you with 97 and your brother with 3. He then eats one leaving him with 2, and gives them to his only brother (you), leaving you with 97+2=99 apples. Q: I have 10 apples. If I give my dad 5 apples, he eats one and gives the rest to his only son. How many apples do I have? A:
Following the same logic as before, if you have 10 apples and give 5 to your dad, you will have 5 apples left. Your dad eats one, leaving him with 4 apples, and then gives the remaining 4 apples to his only son (you). So, you will end up with a total of 5 + 4 = 9 apples.
Notice that in both cases we get correct results on a task that the model couldn't perform before. This really shows the power of prompting models correctly and with the task in mind.
Temperature and inconsistent results
Even after careful optimization of our LLM prompting, we often obtain different results when running our prompts more than once. These inconsistencies can range from small word differences to a different result altogether. This possibility of a very different output between runs results from the autoregressive way in which these models are run - if a single word is chosen differently, that word will affect all predictions of future words.
To control the randomness of these models, API providers often provide a few tunable parameters. The most used of these is usually temperature, which controls the randomness in the token generation. With a temperature of 0 you will usually get (almost) deterministic results, and with a very large temperature you will get pure gibberish and randomness.
When results are inconsistent in a way that affects the outputs drastically, we can use a few prompting techniques to help us use this randomness in a way that can help us:
Self-consistency is used to mitigate inconsistencies by running the same prompt more than once with a non zero temperature, collecting these results and choosing the right option by defining a merging strategy: for categories it can be done with majority vote and for numbers an average can be used.
Image source: Wang et al. 2023
This approach is quite smart and works well for logical reasoning for problems where the chain of thoughts needed to solve them isn't too big. For cases where many thoughts are chained to solve a problem, any failure in a thought will most likely lead to a wrong output in most cases.
The tree-of-thoughts framework looks to address these issues in reasoning for more complex problems by essentially branching out every thought generation into multiple possibilities. The strongest thought or step is chosen at every level in the thought tree and executed accordingly.
Since we're dealing with a tree structure, there can be many search techniques used to find the most desired path, however, it seems like Breadth First Search is the simpler and most like the way humans would tackle the execution of a task.
Image source: Yao et al. 2023
With the knowledge of these methods and frameworks we can much better orient our LLM's towards our goals. Through an iterative process we can analyze results and try to gradually make the prompts better for our specific use cases. There are many tools that can help you through the journey of engineering these prompts, for some more information about these tools you can read more in our blog post about tools for iterative prompt engineering
Engineering prompts plays a pivotal role in tailoring the behavior and output of large language models. By employing clear instructions, relevant contextual information, real-world examples, partial inputs, prefixes, and parameter adjustments, developers can guide the model to generate precise and contextually relevant responses. Like any engineering discipline, mastering these patterns requires consistent practice and an iterative approach. Happy engineering!