Building Hermes – our website chatbot: from Cookie Cutter to our Exceeding Expectations

By Francisco Dagnino

Feb 21, 2024

There really isn’t that much effort that needs to go into building a chatbot…unless you want it to do what you other chatbots don’t do. This is the story of how we got there. More specifically, how I personally went from knowing exactly what to do, to designing complex workflows, to adjusting to new commercial tools, to dismissing half a dozen tools, to falling back to the simplest approach you can think of: a single LLM prompt – and a sprinkle of research.

Origins and Objectives

Our objective was straightforward: we need a chatbot on our website for 2 main reasons:

If we say we know how to build these things, it needs to show
We need as many lead channels as we can muster, and the classic website chatbot is one of those

‍

There were some other overarching constraints too, like costs, ability to upgrade to newer LLMs, minimizing maintenance efforts, ease of understanding and improving.

From a business perspective, my plan was also simple: let the chatbot reduce the friction to finding content on the website, get the user to where they want to go, convince them we should talk, schedule a call.

First approach: Botpress

Simple enough, enter the first tool: Botpress. With its highly customizable workflow, LLM connection flexibility, low cost, dead-easy embedding-like behavior (“training” on website content is made incredibly simple).

The workflow I designed was based on 3 expected paths the user could take:

Understand our offerings in a specific area, thus triggering a sub-workflow that would walk the user through our offerings in a particular area.
Understand our experience in a specific area, triggering a sub-workflow that would provide high level summaries of a number of carefully selected case studies – there is a constraint in the volume of information you want to have available before having your chatbot become too generic.
Straight to booking a meeting: this is also the preferred closing of the 2 paths above.

‍

And so the first draft was born: Hermes v0, just enough so I could shar with the other partners and some close friends for testing.

A snapshot of part of the Botpress version of Hermes

‍

Round one: no bueno

The feedback was decent, but I made a crucial mistake: I told the users what I expected, and hence their testing was biased to exploring the experiences mentioned above. We were very close to releasing this version until we tried jumping from one path to the next. And that broke it. We could have arguably fixed that on Botpress, but the learning was far more valuable: we’re pushing the user down a path, but we’re providing little value.

Why would a user even engage with Hermes? Well:

They’re lazy and don’t want to navigate the website. Why would they even try in this case?
They want to validate our experience with building a chatbot? Unlikely, but even so, the UX with Hermes was pretty bad in this scenario.
To book a meeting? There are multiple ways of doing that much faster on the website.

‍

So we’re back to square one: what do we want out of our chatbot?

Rethinking the Strategy

We decided the best thing we could do is to give the user something they can truly use at work: concrete recommendations they can try to solve their challenges. Instead of paraphrasing and summarizing what is already available on the website, let’s give them something as close as what we would recommend over the phone on a fictitious free consultation. We’ll ask some information on their challenges and the context, then match it against our documented case studies.

This simple rethinking of our approach provides 2 benefits:

We’re still engaging with our prospects, but now only those who have already identified needs we may be able to solve. Over time, exploring this data will help us fine tune our offerings as we hear more from our prospects, to some extent establishing a dialogue with our potential audience.
We’re showcasing our experience and ability to solve real problems, rather than the carefully curated text on every section of our website.

‍

In case you haven’t noticed, there’s a virtuous cycle at work here:

We provide some valuable information
We learn more about our prospects’ needs
We improve our offerings
We provide some more valuable information to our prospects

‍

Trying new tools: Chatbase.co

On the development side, it’s back to square one. The complex mesh on Botpress provided great steerability for Hermes’ behavior, but it didn’t need to be as complex for the new requirements. I could have built it from scratch on the same tool, but I decided to give Chatbase.co a try.

Chatbase.co is another low code chatbot development platform, but unlike Botpress it doesn’t provide the customizable workflow, rather a single prompt, some LLM configurations and all the publishing tools you would need to embed it on a website. It also provides a Zapier integration, which can come in handy, but what I liked about it the most is the built-in lead management functionality, which was perfect for our use case.

The testing results were very promising. The prompt had been tuned to what we wanted and it was really combining well the knowledge base we provided along with OpenAI’s GPT-4 model capabilities.

One thing I didn’t notice at the time, unfortunately, was the unreasonable 39 USD/month just to remove their branding. That’s twice the cost of the basic plan (more than enough for our needs), in addition to the plan. No, that wasn’t going to break the bank, it just felt unreasonable.

Just as I had resigned myself to code all the knowledge embedding from scratch, OpenAI releases their Assistants API, with just enough functionality to easily embed our knowledge base – not huge, but rapidly evolving, so this aspect was important to minimize maintenance efforts.

Enter tool 3: Flowise

Flowise is another low-code LLM development tool with a great UI and tons of integrations. Being open source, using it is free, but you still need to create the environment, which Render.com makes incredibly easy at reasonable rates.

In no time, Flowise released it’s OpenAI Assistant component, making it dead easy to incorporate Assistants into its interface – you can actually create them directly from the Flowise UI.

I had initially considered Flowise for Hermes, but the lack of granular workflow control was a show stopper at the time – remember I was initially planning to have the chatbot guide the user through every possible scenario, but now that a detailed workflow was out of the window, Flowise was just perfect, as long as I could put together a prompt that would reduce the chances of ill-intended behavior, prompt injection, etc. I also had to make sure Hermes would provide just enough to be valuable, without hallucinating its way to an unrealistic recommendation or, worse, a legally binding offer (check this example).

The much simpler version of Hermes in Flowise

‍

Protections and Monitoring

The first part was straightforward: an iteration counter plus some basic guardrails were enough to protect it against users trying to extract more than it’s meant to provide. Flowise also provides configurations to limit the number of hits per user. Additionally, I added monitoring through Langsmith, Langfuse and LLMonitor – a complete overkill, for sure, but they all have their nuances, this has helped me learn how to use all 3 of them.

The second part, prompt injection, not so easy. There’s just so much research and effort going into it, it has simply become a cat and mouse game, much like encryption and data security. Meaning Herme’s will need regular maintenance and updates more often that I would like to have – so much for my efforts on minimum maintenance.

I won’t cover an extensive list of protections that are included and there are certainly some vulnerabilities still open which we’re comfortable with, since the impact would be minimal, but here are some of the guides I found useful:

‍

Final Step: Lead Management

One last addition: some good ol’ fashioned RPA to keep track of leads: enter Make.com + MS Teams.

If you’ve read this far, I’m sure you’re familiar with Zapier – the defacto online automation tool with more integrations available than you can count. The reason I’m going for Make, is that it’s slightly cheaper and provides more than enough for our needs, but this step could be covered by a myriad of tools and it wouldn’t be too hard to code your own.

For this, I created a dead simple custom tool (literally thousands of tutorials on this out there) that hits a webhook created on the Make.com workflow I created – again, many tutorials on this out there. To validate the output format of the custom tool I used Postman.com, allowing me to make sure my custom code was being structured as expected by the webhook API.

Through this automation, we’re now keeping track of all our leads on an Excel spreadsheet with alerts managed by a Teams chatbot (separate project in the PowerAutomate domain), so our team gets notifications every time we have a new lead with contact information and some insights on their company.

Conclusion

There are hundreds of AI tools available out there, all with their own features and limitations. The generative AI ecosystem is growing exponentially as foundational models improve and become cheaper. So it is important to continuously explore new tools and features to develop your own utility belt, you’ll never know when one of these will come in handy for your next project. Here are a few I considered briefly for Hermes, but now leveraging for very different projects:

Relevance.ai: low-code agent development platform
Powerdrill.ai: AI-based knowledge access made easy
Perplexity.ai: AI-powered research
Illa-Builder: an open source low-code platform with UI to develop apps.

Incumbents like OpenAI, Microsoft and Google are also regularly releasing updates, improvements, and new features and some could render your project obsolete or non-competitive. Avoid spending too much time working on functionalities that are likely to be incorporated into the major providers and instead focus your efforts on differentiators that are hard to replicate: find that moat.

Finally, as with every new technology, there will be bugs and limitations that can be exploited by malicious players. Stay informed with the latest threats, incorporate as many precautions as you can, and reduce exposure.

‍