Start Talking to your Data: AI Assistants and Autonomous Agents at Work

By Francisco Dagnino

Feb 21, 2024

Picture this: as you leave home for work, you ask your faithful assistant Jarvis (unimaginative, I know, bear with me, there's a tiny twist) for unopened emails. Out of 25 unread emails, Jarvis quickly whips up a summary of the 3 that actually require your attention – you’re just in CC for the rest, mostly FYI stuff you already know. You ask Jarvis to respond to the first and set up 15 min calls for the other 2.

[You]: “Jarvis, pull up the ops dashboard for the production line, what’s interesting?”

[Jarvis]: “There were 3 unscheduled stops last night: 2 were minor accidents from entering the wrong size of plastic sheets; no injuries, no equipment damaged, and a total of 4 minutes of downtime. The third event kept the PVC crusher down for 3 hours. Ultimately the blades were replaced, but there’s still no clear reason for the problem. Overall production stayed within forecast, but slow downward trend has continued for the past 3 months.”

[You]: “Put together a report with all unscheduled stops in the past 6 months, look for patterns and trends, classify by identified driver and resolution; estimate productivity loss. Ask Engineering to send their crusher specialist to take a look, I want to know everything she discovers. Oh, and set up a meeting with the shift manager, we need to improve training for new hires.”

Agents and Assistants

The concept of autonomous agents or assistants is by no means new – there’s some debate on the definition of the concepts and they keep evolving, but you can trace it all the way back to the Mechanical Turk in the 1700s.

Three years ago, this was mostly sci-fi (I say mostly, because there are some case studies with similar, albeit much simpler, functionality, beyond your home assistant opening shades and turning on the heater), but as LLMs have matured, the possibility of connecting already existing tools and functions with your data and interacting with them through natural language is becoming not only a reality but increasingly effective.

Why the buzz?

For us sci-fi lovers, there’s something mysterious and exciting about them, but there’s no doubt they can be a game changer in any industry, specially for businesses that can’t afford to hire at scale. Beyond that, the ability to do repetitive tasks thousands of times per day without sacrificing quality, taking on work at any time, any day of the week, support some of the most mundane of daily tasks, like setting up a meeting or, my personal favorite, do extensive research on a topic and present the results in different levels of depth. Let’s call these AI Agents: AI entities that can perform a very specific task. There are good reasons to create AI entities with a narrow field of action, but in a nutshell, they tend to perform better in narrow scopes.

You can now combine multiple AI Agents by chaining their output or influencing each other’s behaviors and you can do so autonomously or semi-autonomously. This is called an AI Assistant – there’s some debate on the definitions, but for the purposes of this article, we’ll keep things this way. In general terms, your AI Assistant can be tasked with complex projects and have a myriad of AI Agents at its disposal to accomplish this. What’s more, some of the most advanced tools available today will allow AI Assistants to create their own AI Agents on-demand and even create multiple layers of delegation.

This powerful tool can now become a strong enhancer of human knowledge by not only delegating mundane tasks, but also acting as a means to augment human intelligence when plugged in to the right knowledge base: imagine a mechanical engineer that can now review the plans for entire systems of machines, able to identify connections and areas of improvement that would be impossible by looking at the components alone. Enter Augmented Intelligence, or the empowerment of knowledge workers through AI-driven solutions (check this blog post for more).

So…where do I get myself an AI Assistant?

The basic AI Assistant architecture will attempt to have AI Agents be responsible for only the most granular of tasks. Then have a layer of “leadership” or some guideline to delegate tasks. Finally, a UI component is used to interact with the user.

Beyond that, the sky really is the limit and the number of tools out there with the sole focus of developing the best Autonomous AI Assistant is staggering. OpenAI has clearly made this the target of their next round of solutions, while Facebook, Google, Microsoft and Amazon all have announced some form of investment in the area.

While those pre-packaged solutions may take a while to become commercially available at a competitive cost, you can already start tinkering with the technology and become familiar with what the future may hold. Here are a few making great strides that I've personally tried at differentlevels - in no particular order:

Autogen
BabyAGI
AutoGPT
AgentGPT
SuperAGI
JARVIS - see what I did there?

Going back to the concept of Augmented Intelligence, here’s how I would get started to develop an engine to support let’s say, a patent researcher:

What you need:

1. A Large Language Model (LLM), of course. This will be the engine behind all user interactions and the link between all your AI entities. Some tools can handle multiple models for added experimentation and versatility.

2. A knowledge base: this can be anything from a patent database accessible through an API for our example, but also connections to your existing data, like R&D briefs and projects, some market intelligence database, internet access, and so on. Your use case will define what you need here. Beware of the confidentiality considerations and make sure you have an environment that controls what “goes out”.

3. An orchestration tool: there are dozens and probably hundreds out there. What you’re looking for is a tool that will provide a development environment you’re comfortable with – from pure code to no-code, my personal favorite is Flowise, but I’ve been testing Autogen Studio lately and it’s becoming increasingly robust, highly recommended too.

4. Text-to-Speech (TTS) and Speech-to-Text (STT) engines: as it has become usual, there are dozens of tools out there for these two functions. I recommend starting with some open source alternative to keep things simple and costs to a minimum. As your use case matures and you start getting user feedback, it might be a good idea to try some of the paid models, which are incredibly sophisticated, like Elevenlabs. Just make sure to monitor costs and ensure the investment is justified.

Getting Started

Your workflow will vary depending on the orchestration tool you choose, but you’ll inevitably need to define your agents and how they interact (typically another superseding agent with the task of coordinating others).

LLMs: in all cases, you’ll need to define the LLMs your Assistant will be able to work with. In some cases it will make sense to use different models, be it version, local/cloud and so on. In all cases, be wary of the costs of using these models. Once you run your Assistant, the amount of inference created can be significant.
Tools: some call them skills, but they’re the same thing, i.e. snippets of code that accomplish a single task, expects a data structure as input and defines its data output structure, often in the form of JSON files. Examples:

- Web scraper (single URL)

- Entity finder from body of text

- SQL code interpreter: converts a question in natural language into SQL code for a defined database

- Database query engine: runs a SQL query on a connected database

- API connector: interact with a specific API

‍

Agents: you’ll need to define their goals and tools they can access. Be as specific as possible with its main objectives and clearly define how it will use tools and their outputs. When applicable, describe their interdependence with other Agents. This is all about good prompting, so make sure you identify the best prompting technique applicable to each Agent.
TTS/STT: once again, will depend on the tools you select. I would leave this for last, though. Make sure you have a sound MVP before adding this layer of complexity.
Workflows: this is where you put it all together, connecting some form of UI to an Assistant, to its Agents, to their tools. You’re now ready to test your Assistant!
Bonus: a monitoring tool. LLM monitoring tools are designed to keep track of everything going to-from a LLM API, in some cases including layers that are abstracted by the Agent, significantly improving the visibility over the different models’ behaviors. Some also include cost estimations, very practical to keep your overall costs under control and assess ROI. As I've mentioned before, I'm currently using Langsmith, Langfuse and LLMonitor which is a complete overkill, but it's been a great learning experience so far.

‍

Conclusion

There is so much effort and resources being poured onto Autonomous AI entities, be it agents, assistants, chatbots, that it can sound pointless to even try an even moderately competing tool. However, the likelihood of any of these tools solving your particular use case spot on is slim at best, assuming your use case is not a generic email assistant or similar.

At the same time, just like with any technology, you can assume it has its own quirks and becoming familiar with its strengths and weaknesses is highly valuable in an environment where, though the quality of tools being published improves exponentially, the underlying technology supporting today's LLMs has fundamentally remained the same since 2017. Many limitations have been reduced or removed altogether, but they still have the same behavior.

Bottom line, try this technology, don't be afraid of coding (much can be done without a single line of code), explore how you can tweak it to your own needs, become familiar with its limitations, aim for highly catered use cases, embrace it as a new paradigm, check in with us for support...:)