What type of LLM model do I need for my chatbot?
What type of LLM is best for a generative AI chatbot? When thinking about deploying LLM based products, like chatbots, co-pilots and analysis tools, there are lots of different options.
In this article, we’ll look into a few of the options- closed-source vs open source, model size, capabilities. Then we’ll have a look at what your options are, and the sort of use-cases they work best for.
This article is part of Calibrtr’s guide to chatbots. You can go back to the series Intro here or visit our guides to prompting, evaluating, and developing Key Performance Indicators for your chatbot.
What are my options?
When we’re looking at choosing a LLM model to work with, you need to weigh a couple of different factors- the access type you want (closed or open source), what model size you need, and any particular capabilities you need.
- Closed-source AI vs open-source. Closed-source models are provided by companies like OpenAI and Anthropic. You normally pay a subscription for web-based consumer access or a cost-per-use fee for calling their API. All the hosting and updating is handled by the provider. Open-source models are free to download and use, such as Llama by Meta, but still need to be hosted. You can either pay for cloud hosting, or if the model is small enough (or your on-site servers powerful enough) you can host it on your premises (often called ‘on-prem’).
- Model size. Models vary in size. Broadly speaking, open source models are smaller than closed source. Open source models range from the smallest of the Apple OpenELM models (between 270 million and 3 billion parameters*) all the way up to the largest LLama2 model from Meta which has 70bn parameters or Qwen-72B from Alibaba with 72bn parameters. While there aren’t many published stats on the closed source models, we know that models like ChatGPT-3.5 have around 175bn parameters and ChatGPT-4o is rumoured to have trillions. However, the choice comes down to use case. Smaller, more focused models might be a better choice for your task than a very large general-purpose model. Closed-source providers have recognised this, and are releasing smaller and cheaper models now, like ChatGPT-4o Mini.
- Capabilities. Models have a few different areas of capability that might be important for your decision making. You might need a very large context window (the maximum prompt size), for example, which would mean that Claude 3.5 Sonnet at 200,000 tokens of context would be great. Most models these days are multi-modal (able to take inputs and provide outputs in lots of different formats, e.g text, audio, video, image), but some excel at particular formats.
How to choose between closed and open source for your chatbot?
There’s a lot to consider. As a starting point, you may wish to think through what your requirements are, asking yourself:
- How sensitive or not is my information?
- How complex are the interactions I expect?
- What capabilities do I need?
- How advanced and capable an LLM do I need as a minimum?
- How much volume do I expect?
- How much context do I need per conversation?
With that in mind, we can have a look at your options:
Closed Source Generative AI APIs (e.g ChatGPT, Claude, Gemini)
Using a closed-source LLM through an API call is a very easy way to get started with developing generative AI products. To create a basic chatbot, for instance, you just code up the chat window (or copy an existing one), set up your API, and engineer a prompt with the system information (rules and context info). It might take a little time to tweak the prompt to get the performance you want- here’s our guides to prompt engineering and evaluating your chatbot- but setting up the API is easy.
Pros
- It’s low commitment. If you want to change models you can just keep your existing code and prompts and switch your API.
- The major model producers are constantly working on improvements and security.
- Most of the models are hugely capable, with billions of parameters and all sorts of multi-modal abilities. They are the luxury SUVs of the LLM world.
- Iterating on prompts is fast.
Cons
- It can be more expensive for longer prompts. Prices vary widely across models but are all based on the length of your prompt- our guide on LLM pricing is here:
- Models have limits on context window
- There’s less customisation available, although some closed-source providers do offer fine-tuning.
Best for: short prompts, low commitment deployments, quick iteration on prompts, just starting out on the AI journey.
Some of the most popular closed-source options are:
- OpenAI’s ChatGPT: https://chatgpt.com
- Anthropic’s Claude: https://www.anthropic.com
- Google Gemini: https://ai.google.dev
Open-source (e.g Qwen2, Llama)
Open source models require a little more work to set up, and come with different trade offs to the closed source models. Here’s some of the things you’ll want to consider:
Pros
- Data privacy and security can be higher, particularly with on-premises models which can be air-gapped, kept separate from the internet. This means they are particularly good for secure deployments.
- Costs can be lower, particularly smaller models used for high volume applications. At low volumes, set-up costs for cloud hosting can make it more expensive than closed source, but this can shift at higher volumes. However, for more complex models and lower latency the trade off changes as hosting costs mount up. We talk more about this in our article on LLM costs
Cons
- Models, being generally smaller, can have less wide ranging capabilities or less knowledge (although this can be augmented with techniques like RAG).
- Latency can be an issue, although this is model/deployment specific. You can choose to use more powerful servers to decrease latency, but this will increase cost.
- Unlike the closed-source API options, you will need to install and maintain your own models, and upgrade them yourself.
Best for: Custom applications, high security deployments, small models and high volume applications.
Some of the most popular open source models are:
- Meta’s LLaMa family of models: https://llama.meta.com
- Mistral’s Mistral and Mistral models: https://mistral.ai
- Cohere’s Command R+: https://docs.cohere.com/docs/command-r-plus
So what do I need?
As you've seen, there are so many options for deploying LLMs. It’s a very personal choice based on your circumstances and use-cases.
- For a start-up doing early stage chatbot product development, a cloud AI provider makes a lot of sense for the flexibility.
- For a high-security deployment for an internal knowledge chatbot, which requires a lot of contextual information and controls, an air-gapped on-prem open-source model is likely to be a better fit.
One tool that can help you with your decision is Calibrtr’s Model Experiment tool, which allows you to assess your use-case and prompts against the top models to see what hits the right spot for you in terms of quality and cost.
Want to try out our services?
Calibrtr offers a Generative AI cost management and performance review platform, with tools to forecast and manage costs, build and evaluate prompts, experiment with different models and do A/B testing of prompts, monitor performance and build in human-in or out of-the-loop approvals and reviews. We’re currently in Beta- get in touch with us to find out more!
Our limited beta program is open
If you'd like to apply to join our beta program, please give us a contact email and a brief description of what you're hoping to achieve with calibrtr.