Writing prompts for chatbots
Chatbots are one of the most accessible applications of generative AI right now. But how do you design prompts to deliver a really high quality chatbot that does exactly what you want?
This article is part of Calibrtr’s guide to chatbots. You can go back to the series Intro here or visit our guides to evaluating chatbot performance and developing KPIs for chatbots, and choosing the right LLM and tools for your chatbot.
Firstly, we’ll talk about some of the things to keep in mind when designing a prompt. Then we’ll talk about how to design a good prompt, and some of the other factors to consider. We'll be using a customer service bot as our example, but these ideas transfer to lots of other applications too.
Things to keep in mind while designing a prompt
Here’s some of the things you might want to keep in mind while you design prompts for your chatbot system.
- Strengths and weaknesses of LLMs. LLM stands for Large Language Model. These systems are great at understanding language and outputting high quality text, audio, images and video. But they don’t think in the way a human would or have strong reasoning skills, so it is very easy to convince a LLM to output text which is logically inconsistent or entirely incorrect.
- Capability and constraints. These models have been trained on huge datasets and can output information on almost anything, so part of the aims of prompt engineering are to constrain them to only do what you want them to do. You don’t want your widget customer service bot giving restaurant recommendations for Manchester, which it absolutely could and would if you didn’t constrain it properly.
- Memory problems. LLMs don’t automatically remember things like a human would. To keep the same context and allow it to ‘remember’ you need to send your entire prompt and message history with every message. This can create large prompts, which leads us on to…
- Prompt size = cost. Each interaction with an LLM costs money (either open-source hosting costs or costs from using the API for a closed source model like Chat GPT). Larger prompts cost more. We go into a lot more detail about costs and some of the techniques to reduce them in our article here.
Designing prompts
Let's use that customer service bot for a websaite as an example. So we know our aim is to write an efficient prompt that tightly constrains the LLM and delivers the right outcome. Some of the things you need to think about are:
- How will the bot start and end the conversation?
- How long should the conversation be?
- What tone do you want your bot to take? and how far should that go?
Let’s dig a bit more into these different elements:
- Personality. This is an entirely personal choice. One approach is to ask whichever model you are using to give you examples of different prompts for your chatbot.
Here’s an example we created using OpenAi's ChatGPT. We asked “Please give me examples of prompts that I could use to ask a chatbot to take on different personalities for a business chatbot, and then provide a sample of what that persona would do in a conversation.” Here’s the responses we got:
1. Professional and Formal
Prompt: "You are a professional and formal business assistant. Your responses should be polite, concise, and use proper business language."
2. Friendly and Approachable
Prompt: "You are a friendly and approachable business assistant. Your responses should be warm, conversational, and helpful."
3. Technical and Detailed
Prompt: "You are a technical business assistant. Your responses should be detailed, precise, and informative, particularly regarding technical aspects."
4. Enthusiastic and Motivational
Prompt: "You are an enthusiastic and motivational business assistant. Your responses should be energetic, encouraging, and positive."
-
Positive instructions. This is what you want it to do. This can be as narrow as you want it to be.
-
Negative instructions. This is what you DONT want it to do. It’s best to go broad and catagorical here. Something along the lines of ‘You must not provide information or advice about any other topics under any circumstances’ is a good place to start. You can then give it instructions for how it should behave if someone asks for that
-
Opening and closing the conversation. It’s a good idea to give clear instructions about how the bot should open the conversation, the sort of greeting you want. You can then define how you want it to close the conversation. Depending on the application, you may want to prompt or code in a limit on conversational couplets.
-
Context. Context is the specific material you want the chatbot to draw from. In our example above the context is a short paragraph of business information, but for most applications it will be larger. The newer models can take a huge amount of context in their prompt windows, up to 128,000 tokens for ChatGPT-4o or 200,000 for Claude Sonnet 3.5. However, large prompts mean more cost. It’s not that scalable an approach to put all your context into a prompt if a)it’s a lot of text and b)you expect a high volume of users or back-and-forth messages. This is where techniques like Retrieval Augmented Generation, or RAG, comes in. Our guide to reducing LLM costs has more ideas.
-
Using examples and demonstrations in your prompts. It’s good practice to use examples or demonstrations in your prompt to help to train the AI in the right way to respond. The more examples you include in your prompt, the higher the cost but the greater the control. Another option is fine-tuning, where you would use many more examples to train a custom version of a LLM, which then reduces the number of examples needed.
What does a prompt look like?
To pull all of this together, here's one we built earlier... This is a prompt for a website chatbot, which aims to give parameters for the chatbot’s personality, positive instructions (what it should do) and negative instructions (what it shouldn’t do), how to open and close the conversation, context and an example conversation.
To give the AI a clear idea of what we want, we can also include one or more ideal conversations
Other things to think about
We've given you some broad ideas for how to build a really comprehensive chatbot prompt. But there are also other areas you need to think about.
Cyber security for chatbots
LLMs can be hacked through various methods. A non-exhaustive list of attacks includes jailbreaking, where particular designs of prompts or strings of prompts can override safety guardrails, context window overflow where very long messages are used to fill up the context window (the maximum amount of text in the prompt) which can lead to all your carefully written prompts being lost, distributed denial of service attacks, attacks that cause leaks of data (particularly when using RAG) and many others. While a proper discussion of cyber risks and approaches is outside the scope of this article, we would recommend that you research and develop an approach to cyber risks for your chatbot.
Code or natural language?
One consideration is whether to use natural language or code for prompts. Natural language prompts are easily understood by humans, and allow for rapid easy prompt development. But they are less exact and precise than coded prompts. Code is also much better for calling functions, and can be better for structuring the outputs of a LLM so it can then be recorded elsewhere. Thankfully, there’s no need to chose one or the other, as LLMs are perfectly capable of working from a mixed code and text prompt.
Evaluating Chatbot Prompts
There’s two broad areas of evaluation needed for prompt engineering a customer service type chatbot: does the prompt do what you intend it to do, and does this translate to real world performance?
Does it do what you want?
Looking at how a prompt performs on a computer level, you need to see if the LLM’s outputs match what you’d like it to do (and very importantly, doesn’t do anything else…). Some of the questions you could ask include:
- Do your constraints work as you intend? Is it only talking about what you want it to?
- Can you trick the AI into revealing more information than it should, or making up information?
- Is the AI’s tone right for your brand?
- Is it doing the right steps or actions in the conversation?
- If it calls a function to do a task, like sending info to your CRM system, does that actually work?
Does it perform in the real world?
Looking at real world performance, you might want to be able to assess:
- Is the AI making clients happy?
- Is it making conversations with clients more efficient - getting more leads, saving money on customer service?
- Is it having a positive or negative effect on your brand experience?
If you want to learn more about evaluation, including the options around human in the loop or AI evaluation and some of the tools available, we have a more in depth guide here.
How to shrink your prompts and costs
As we mentioned earlier- and discussed in more depth in our cost article, the size of your prompt is a really important factor in the cost of your deployment. So how can you shrink your costs?
Some of the options we’ve explored (and developed tools) for include:
- Using a compression algorithm to reduce the size of a prompt while keeping the meaning. If you would like to try out our version, our demo is here
- A/B testing can be used to test out versions of prompts and record results to find the optimum. We’re developing a tool for this right now.
- Our Conversation Replay tool allows you to review conversations with real users, and then replay them with a different prompt to judge whether a different choice might provide a better outcome in future.
- Our Model Experiment tool allows you to test a wide range of models against your specific use-case to find the best fit for performance and quality.
Want to try out our tools?
Calibrtr offers a Generative AI cost management and performance review platform, with tools to forecast and manage costs, build and evaluate prompts, experiment with different models and do A/B testing of prompts, monitor performance and build in human-in or out of-the-loop approvals and reviews. We’re currently in Beta- get in touch with us to find out more!
Our limited beta program is open
If you'd like to apply to join our beta program, please give us a contact email and a brief description of what you're hoping to achieve with calibrtr.