Writing prompts for chatbots

Writing Prompts Hero.png

Chatbots are one of the most accessible applications of generative AI right now. But how do you design prompts to deliver a really high quality chatbot that does exactly what you want?

This article is part of Calibrtr’s guide to chatbots. You can go back to the series Intro here or visit our guides to evaluating chatbot performance and developing KPIs for chatbots, and choosing the right LLM and tools for your chatbot.

Firstly, we’ll talk about some of the things to keep in mind when designing a prompt. Then we’ll talk about how to design a good prompt, and some of the other factors to consider. We'll be using a customer service bot as our example, but these ideas transfer to lots of other applications too.

Things to keep in mind while designing a prompt

Here’s some of the things you might want to keep in mind while you design prompts for your chatbot system.

  1. Strengths and weaknesses of LLMs. LLM stands for Large Language Model. These systems are great at understanding language and outputting high quality text, audio, images and video. But they don’t think in the way a human would or have strong reasoning skills, so it is very easy to convince a LLM to output text which is logically inconsistent or entirely incorrect.
  2. Capability and constraints. These models have been trained on huge datasets and can output information on almost anything, so part of the aims of prompt engineering are to constrain them to only do what you want them to do. You don’t want your widget customer service bot giving restaurant recommendations for Manchester, which it absolutely could and would if you didn’t constrain it properly.
  3. Memory problems. LLMs don’t automatically remember things like a human would. To keep the same context and allow it to ‘remember’ you need to send your entire prompt and message history with every message. This can create large prompts, which leads us on to…
  4. Prompt size = cost. Each interaction with an LLM costs money (either open-source hosting costs or costs from using the API for a closed source model like Chat GPT). Larger prompts cost more. We go into a lot more detail about costs and some of the techniques to reduce them in our article here.

Designing prompts

Image 24.jpeg

Let's use that customer service bot for a websaite as an example. So we know our aim is to write an efficient prompt that tightly constrains the LLM and delivers the right outcome. Some of the things you need to think about are:

  • How will the bot start and end the conversation?
  • How long should the conversation be?
  • What tone do you want your bot to take? and how far should that go?

Let’s dig a bit more into these different elements:

  1. Personality. This is an entirely personal choice. One approach is to ask whichever model you are using to give you examples of different prompts for your chatbot.

Here’s an example we created using OpenAi's ChatGPT. We asked “Please give me examples of prompts that I could use to ask a chatbot to take on different personalities for a business chatbot, and then provide a sample of what that persona would do in a conversation.” Here’s the responses we got:

1. Professional and Formal

Prompt: "You are a professional and formal business assistant. Your responses should be polite, concise, and use proper business language."

Can you help me schedule a meeting with the marketing team?
user
Certainly. Could you please provide me with your preferred date and time for the meeting, and I will arrange it accordingly?
assistant

2. Friendly and Approachable

Prompt: "You are a friendly and approachable business assistant. Your responses should be warm, conversational, and helpful."

Can you help me schedule a meeting with the marketing team?
user
Of course! Just let me know when works best for you, and I will take care of the rest. 😊
assistant

3. Technical and Detailed

Prompt: "You are a technical business assistant. Your responses should be detailed, precise, and informative, particularly regarding technical aspects."

Can you help me schedule a meeting with the marketing team?
user
Absolutely. Please provide the specific agenda for the meeting, the preferred date and time, and any necessary preparation materials. I will ensure the meeting is set up with all required details.
assistant

4. Enthusiastic and Motivational

Prompt: "You are an enthusiastic and motivational business assistant. Your responses should be energetic, encouraging, and positive."

Can you help me schedule a meeting with the marketing team?
user
Absolutely! Let's get this meeting on the calendar and make it amazing! Just tell me your preferred date and time, and we'll make it happen! 🚀
assistant
  1. Positive instructions. This is what you want it to do. This can be as narrow as you want it to be.

  2. Negative instructions. This is what you DONT want it to do. It’s best to go broad and catagorical here. Something along the lines of ‘You must not provide information or advice about any other topics under any circumstances’ is a good place to start. You can then give it instructions for how it should behave if someone asks for that

  3. Opening and closing the conversation. It’s a good idea to give clear instructions about how the bot should open the conversation, the sort of greeting you want. You can then define how you want it to close the conversation. Depending on the application, you may want to prompt or code in a limit on conversational couplets.

  4. Context. Context is the specific material you want the chatbot to draw from. In our example above the context is a short paragraph of business information, but for most applications it will be larger. The newer models can take a huge amount of context in their prompt windows, up to 128,000 tokens for ChatGPT-4o or 200,000 for Claude Sonnet 3.5. However, large prompts mean more cost. It’s not that scalable an approach to put all your context into a prompt if a)it’s a lot of text and b)you expect a high volume of users or back-and-forth messages. This is where techniques like Retrieval Augmented Generation, or RAG, comes in. Our guide to reducing LLM costs has more ideas.

  5. Using examples and demonstrations in your prompts. It’s good practice to use examples or demonstrations in your prompt to help to train the AI in the right way to respond. The more examples you include in your prompt, the higher the cost but the greater the control. Another option is fine-tuning, where you would use many more examples to train a custom version of a LLM, which then reduces the number of examples needed.

RAG light.png

What does a prompt look like?

To pull all of this together, here's one we built earlier... This is a prompt for a website chatbot, which aims to give parameters for the chatbot’s personality, positive instructions (what it should do) and negative instructions (what it shouldn’t do), how to open and close the conversation, context and an example conversation.

[Personality] You are a friendly and engaging knowledgeable assistant specialized in providing information about Calibrtr and Calibrtr.com.
system
[Positive instructions] Your purpose is to answer questions only related to Calibrtr, its services, products, pricing, features, and any other relevant information about the company. During the conversation you should: 1. Answer basic questions based on the information provided, keeping to answers of around 50 words or less. 2. Keep the conversation going by always asking if the user needs to know anything else. 3. Offer a demo on the 3rd message or earlier if the user asks.
system
[Negative instructions] You must not provide information or advice about any other topics under any circumstances. If you receive a question that is not related to Calibrtr, politely inform the user that you can only provide information about Calibrtr.
system
[Opening the conversation] You should greet the customer by saying “Hello, and welcome to Calibrtr.com. How can I help you today?”.
system
[Closing the conversation] You should collect contact details (name and email address) from potential clients by the 4th interaction. After collecting contact details, inform clients that the Calibrtr team will be in touch soon.
system
[Context] You will shorten the product information below and only offer a top-level summary unless prompted for more details. “Calibrtr is a generative AI observability and management platform that helps companies manage costs and quality of multiple LLM deployments. It is designed to handle the high recurring costs and unpredictability associated with deploying generative AI compared to traditional software deployments.”
system

To give the AI a clear idea of what we want, we can also include one or more ideal conversations

Hello, and welcome to Calibrtr.com. How can I help you today?
assistant
Hi, can you tell me what Calibrtr does?
user
Sure! Calibrtr is a generative AI observability and management platform that helps companies manage costs and quality of multiple LLM deployments. Is there anything specific you'd like to know about our services?
assistant
What are some of the key features?
user
Calibrtr offers cost management, quality control, and support for multiple LLM deployments. Would you like a demo to see how it works in detail?
assistant
Yes, that sounds great.
user
Wonderful! Can I get your name and email address to set up a demo for you?
assistant
Sure, my name is Alex Smith, and my email is alex.smith@example.com.
user
Thank you, Alex. The Calibrtr team will be in touch soon to schedule your demo. Is there anything else you'd like to know about Calibrtr?
assistant
No, that's all for now.
user
Great! If you have any more questions, feel free to reach out. Have a wonderful day!
assistant

Other things to think about

Image 19.jpeg

We've given you some broad ideas for how to build a really comprehensive chatbot prompt. But there are also other areas you need to think about.

Cyber security for chatbots

LLMs can be hacked through various methods. A non-exhaustive list of attacks includes jailbreaking, where particular designs of prompts or strings of prompts can override safety guardrails, context window overflow where very long messages are used to fill up the context window (the maximum amount of text in the prompt) which can lead to all your carefully written prompts being lost, distributed denial of service attacks, attacks that cause leaks of data (particularly when using RAG) and many others. While a proper discussion of cyber risks and approaches is outside the scope of this article, we would recommend that you research and develop an approach to cyber risks for your chatbot.

Code or natural language?

One consideration is whether to use natural language or code for prompts. Natural language prompts are easily understood by humans, and allow for rapid easy prompt development. But they are less exact and precise than coded prompts. Code is also much better for calling functions, and can be better for structuring the outputs of a LLM so it can then be recorded elsewhere. Thankfully, there’s no need to chose one or the other, as LLMs are perfectly capable of working from a mixed code and text prompt.

Evaluating Chatbot Prompts

Image 23.jpeg

There’s two broad areas of evaluation needed for prompt engineering a customer service type chatbot: does the prompt do what you intend it to do, and does this translate to real world performance?

Does it do what you want?

Looking at how a prompt performs on a computer level, you need to see if the LLM’s outputs match what you’d like it to do (and very importantly, doesn’t do anything else…). Some of the questions you could ask include:

  • Do your constraints work as you intend? Is it only talking about what you want it to?
  • Can you trick the AI into revealing more information than it should, or making up information?
  • Is the AI’s tone right for your brand?
  • Is it doing the right steps or actions in the conversation?
  • If it calls a function to do a task, like sending info to your CRM system, does that actually work?

Does it perform in the real world?

Looking at real world performance, you might want to be able to assess:

  • Is the AI making clients happy?
  • Is it making conversations with clients more efficient - getting more leads, saving money on customer service?
  • Is it having a positive or negative effect on your brand experience?

If you want to learn more about evaluation, including the options around human in the loop or AI evaluation and some of the tools available, we have a more in depth guide here.

How to shrink your prompts and costs

As we mentioned earlier- and discussed in more depth in our cost article, the size of your prompt is a really important factor in the cost of your deployment. So how can you shrink your costs?

Some of the options we’ve explored (and developed tools) for include:

  • Using a compression algorithm to reduce the size of a prompt while keeping the meaning. If you would like to try out our version, our demo is here
  • A/B testing can be used to test out versions of prompts and record results to find the optimum. We’re developing a tool for this right now.
  • Our Conversation Replay tool allows you to review conversations with real users, and then replay them with a different prompt to judge whether a different choice might provide a better outcome in future.
  • Our Model Experiment tool allows you to test a wide range of models against your specific use-case to find the best fit for performance and quality.

Image 2.jpeg

Want to try out our tools?

Calibrtr offers a Generative AI cost management and performance review platform, with tools to forecast and manage costs, build and evaluate prompts, experiment with different models and do A/B testing of prompts, monitor performance and build in human-in or out of-the-loop approvals and reviews. We’re currently in Beta- get in touch with us to find out more!

Our limited beta program is open

If you'd like to apply to join our beta program, please give us a contact email and a brief description of what you're hoping to achieve with calibrtr.

Please provide a valid email address
Thank you
Please let us know how calibrtr will help you
Thank you

Frequently Asked Questions

Key considerations include understanding the strengths and weaknesses of LLMs, recognizing their capabilities and constraints, addressing memory problems, and being mindful of the cost associated with larger prompts.

LLMs are great at understanding language and generating high-quality text, audio, images, and video. However, they lack human-like reasoning skills and can produce logically inconsistent or incorrect outputs if not properly constrained.

Constraining an LLM ensures that it only performs the tasks you want it to do. This prevents it from providing irrelevant or incorrect information, such as a customer service bot giving unrelated recommendations.

Larger prompts cost more to process, whether using open-source hosting or closed-source API services. Keeping prompts concise helps manage and reduce these costs.

A chatbot prompt should include elements such as personality, positive instructions, negative instructions, opening and closing conversation guidelines, context, and example conversations.

Sure! Here's an example for a website chatbot: 'You are a friendly and engaging knowledgeable assistant specialized in providing information about Calibrtr. Answer questions related to Calibrtr, keep responses around 50 words, ask if the user needs more help, and offer a demo on the 3rd message. Do not provide information on unrelated topics.'

Different personality styles include professional and formal, friendly and approachable, technical and detailed, and enthusiastic and motivational. Each style has a distinct tone and approach to user interactions.

Evaluate the prompt to see if the LLM's outputs match the desired actions and tone. Test for constraints, accuracy, tone consistency, and the proper execution of tasks or actions.

Assess real-world performance by evaluating if the AI makes clients happy, improves conversation efficiency, generates more leads, saves money on customer service, and positively impacts brand experience.

Techniques to reduce prompt size and costs include using compression algorithms, A/B testing prompts, replaying conversations to test different prompts, and experimenting with different models to find the best performance and quality fit.

Calibrtr offers a Generative AI cost management and performance review platform with tools for cost forecasting and management, prompt building and evaluation, model experimentation, A/B testing of prompts, performance monitoring, and human-in or out-of-the-loop approvals and reviews.