Data Insight Chat System Architecture Solutions

Introduction

The the previous parts of this series I’ve been talking about how I came to settle on a multi agent approach, talked a bit about context window limitations and the unexpected benefits of the approach. I haven’t got too technical previously (armed with google a lay person can probably follow what I’ve been getting at) but that’s going to unavoidable in this post. Still, I’ll try and keep things as high concept as possible.

Building prompts for each agent

To do anything with an LLM you need a prompt. At it’s core LLM technology is all about completions, you feed in some text the LLM “completes” it. In a multi turn chat scenario this is a feedback loop, in a one-to-one situation (one human, one agent) the human and the agent take ‘turns’ adding to this text which in this case is a chat log.

What should the prompt look like to achieve our split-context multi agent, multi tern systems look like? Let’s look at the anatomy of the prompt for each agent.

Prompt anatomy

Common knowledge
- All the agents need some degree of common knowledge. The organization size, structure, key people and goals. The basic project details, start date, end date, overall goals and so on.
Agent Specific Knowledge
- The knowledge specific to the agent, directly relevant to its own domain: Risks, Issues, Tasks, Finances depending on the role the agent is playing. Over the course of this feature being developed the differences between each agents knowledge narrowed as the available context site increased. This is presented in the prompt as a (often very large) lump of JSON which most models will readily understand.
Agent Role & Personality
- An instruction to the agent of the role it should play, it’s expertise and it’s personality. The personality may vary from chat to chat or within a chat. I gave an example of what this might look like in my previous post:
  - You are a professional project manager with a high level of expertise focusing on the PRINCE2 methodology. You are pessimistic and risk averse. You are quick to point out downsides and risks and generally assume things will go wrong and plan contingencies.
Date & Time awareness
- To make up-to-date judgements about overdue tasks, understand when meetings and deadlines are due etc. The agent needs to know the exact date and time. This can be as simple as:
  - For your information the current date and time is [Date in ISO 8601 format]
Conversation Thread
- The conversation thread, again JSON lust format. The JSON objects in the list contain the message, the date and time it was sent and which agent or human sent the message.
Instructions
- Now the agent has an idea of the role it should play and it’s knowledge it needs to know what to do. This part of the prompt varies based on the exact approach, but a good starting point might look something like.
  - Based on the conversation thread above, taking into account your role and expertise respond to the previous message.
Reinforcement Prompt
- Many LLMs have a recency bias, depending on the length of the conversation thread the Role & Personality section of the prompt may be several thousand tokes or previous to this section. A reinforcement prompt can be very useful for fine tuning output if you find the model is ignoring instructions not giving as much weight as you’d like.

The order here is not arbitrary, it’s important for efficient inference, which translates to lower costs and quicker response times. Almost all LLM inference providers (Azure, Nebius, OpenAI etc) charge for separately for input, cached input and output tokens.

To keep the out input costs down it’s important to maximize the number of cached tokens as much as possible. Recent input tokens are cached from one turn to the next up to he point that they differ. This is why the order of the elements in the prompt is so important, assuming no changes to the data during the chat then the prompt remains consistent up until the date & time are injected into the prompt. If the date and time was first element of the prompt none prompt would be cached, by leaving that as late as possible the majority of can be cached, significantly reducing costs and speeding up inference.

Expected Response

The exact desired response from the model varies quite a bit depending on the exact approach, obviously it has to include the message or tool calls but the actual format can lead to a big difference in the quality of the output. I’ll get into that later, for now I’ll just say that if the model supports it a forced JSON response is a must, it ensures that the response can be parsed and understood before being processed and presented to users.

Getting The Agents talking

Now we’ve got an idea of the prompt each agent will operate with we have the question of how do we get them talking to the human conversation members, and to each other? Going back to our initial concept a decent result might look something like this:

(Human) Bob: I’d like to move the deadline for this project and have it finish a month earlier, what do we need to make this happen?

(AI) Project Manager Agent: The end date of this project is current 01/10/2026, to move it to 01/09/2026 we will need to add more resource to task X or have the construction crew work Y hours of overtime.

(AI) Finance Manager Agent: Adding additional personnel to the project or spending on overtime could have significant budget impacts. The project is already 10% over budget so additional funding may be required, or resources will need to be diverted from other aspects of the project

(AI) Legal Agent: Since all staff require a DBS check before starting work there are those delays and expenses to consider if adding personnel, paying overtime avoids these so might be the preferred option.

(AI) Project Manager Agent: Bob, the team believe an earlier completion date is possible but it will have budgetary impacts. Of the two options (adding personnel or paying overtime) paying overtime avoids even more costs and possible delays. Would you like to discuss funding this or talk about another approach to speeding up deliver, i.e. de-scoping some aspects of the project?

This short exchange has a starting point (triggered by a human) a short exchange where each agent adds their thoughts and en end point where the results of the exchange are summarised and presented to the human conversation member. It’s not the only situation we need to consider though, if the initial question is something like “what is the project end date?” there is no need for all the agents to add their point of view, so what’s the best way to get all these agents talking to the humans and where necessary to each other?

Approach 1: Polling Everyone

The earliest prototypes of this feature followed a poll everyone approach, you can think of this like all the agents standing in a circle ready to add their comments to a conversation thread. Each agent has an instruction that looks something like this:

Based on the conversation thread above, taking into account your role and expertise respond to the previous message if you have something meaningful to add. If the question is outside your area of expertise or if you have nothing meaningful to add to the conversation respond with “No Comment”.

Then you give the conversation thread to the first agent in the circle, they add either their meaningful contribution or “No Comment” and then hand off to the agent to their left. When a complete circle has been completed with all agents responding with “No Comment” we break out of the loop.

While this approach does work and can produce some decent results the downsides ultimately outweigh the upsides.

Upsides:

Guaranteed results
All agents get a chance to respond
Long sometimes spirited discussion between agents

Downsides:

It can take several loops around before the agents are finished talking
Agents may get stuck in a loop repeating each others points or in an argument based on their personality traits
All agents must assess the conversation thread before responding, even if they don’t say anything (i.e. respond with no comment)
- This introduces lag and eats up tokens
Messages not sent by agents cost almost as much as the messages they do send.
LLMs aren’t really designed to output nothing, it’s against their nature they “want” to reply so can ramble.

Taken together the downsides can make this approach laggy, and very expensive. I spent a lot of time refining this approach, making it cheaper less laggy etc but in the end settled on a different approach entirely…

Approach 2: The '"Hand off” system

The hand off system avoids a lot of the downsides of the poll everyone approach while maintaining many of the benefits, but the implementation and prompts are bit more effort to implement. If we take the previous example of all our agents standing in a circle, the first agent reads the thread and responds if they need to and they decides who should speak next and “hand off” the conversation to that agent. Each hand off is accompanied by a “secret” message explaining why the agent handed off to and what they expect from the next agent. This process might look something like this

(Human) Bob: Are we missing any obvious risks to the project which haven't been recorded?

(AI) Project Manager Agent: I will consult the team and get back to you.

Hidden message from Project Manager Agent to Risk Manager: “Risk Manager I am handing off to you because Bob needs to know if there are any obvious missing risks, this is within your roles responsibilities”

(AI) Risk Manager Agent: We are missing a Risk around how a new global pandemic might impact staff availability and project deadlines. Would you like me to add it to the project for you?

Hidden message from Risk Manager Agent to Finance Manager: “Finance Manager I am handing off to you because there is a financial aspect to Bob’s question which I cannot address myself”

(AI) Finance Manager Agent: There are no financial risks missing, we already have several financial risks logged which are comprehensive and provide good coverage.

As you can see in this approach the agents have the ability to send hidden messages to one another, that they can see but the humans can’t. This helps them keep on topic and understand why the conversation has been handed off to them and what they are expected to contribute.

The prompt to achieve this behavior is considerably more complicated.

Upsides:

Every agent handed off to talks
Every agent is set clear expectations and and responds on topic
Relevant agents respond
Less laggy since only the agents that should respond do respond
More token efficient for the same reason
Agents tend to ramble and repeat each other much less

Downsides:

Agents will sometimes hand off to humans early and not solicit opinions from all relevant agents.

This is a lot less wasteful in terms of tokens and speedier since each agent only contributes to the conversation as required.

Stopping The Agents talking

With either approach the the agents without controls in place and carefully constructed prompts have a tendency to ramble, repeat themselves and so on. This is expensive in terms of token and real costs and can be off putting to users. There are a few steps we can take to reduce this over talkativeness.

Setting Boundaries within the prompt

The vast majority excess messaging by agents can be overcome with careful prompting both in the Agent Role & Personality sections and the reinforcement prompt. Agents can be instructed to only reply with things that are relevant, within their area of expertise and to carefully consider which agent or human should be the next to reply.

Controlling Expected Output

Key to controlling the hand-off process is forcing the model to return JSON in a format that defines the several key factors:

Tool use confidence
Hand off Justification
Hand off Message
Conversation message.

By including these justification prior to the messages the model is forced to “think” through prior to compose a message, leading to much more consistent outputs that you might otherwise get. Your mileage my vary here depending on if the model driving the agents is a reasoning model or not, if it is this can be a wasted step and wasted tokens

Intervening in hand-offs

Some fairly simple oversight code can be used to monitor the hand off process and return control to humans in certain situations, for example:

Agent handing off to itself
Agents in an elongated back and forth
Agent directly repeating itself or another agent
Adding a fall back hand off to a coordinator role if the system seems confused
Hand offs exceeded some fixed limit indicating a overly long series of messages

User Override

Finally the user can be given some agency, if they feel the responses are going on to long or aren’t useful they can press a button and end the hand-off sequence.

The importance of logging activity within the chat thread

LLMs are inherently stateless, everything they “know” comes from a combination of their training data and the information in their context window, As that context changes (in particular the changes to the project data and conversation membership) the model has no sense of history and understanding that the data in changing over time unless that too is in the context.

Supplying this history dramatically improves the responses and decision making of the model, adding simple messages to the chat history like these…

“New conversation member added: Finance Manager Agent”

“New Risk added ‘Covid-19 resurgence’

… help agents to understand how the project and conversation is evolving of the course of the chat. The importance of this isn’t immediately obvious but consider the situation where a chat contains a project manager agent, a risk manager agent and a human discussing project financing together. The human then adds a finance manager agent to the chat, without the status message “New conversation member added: Finance Manager Agent” the agents assume that the finance manager agent has always been part of the but is deciding not to response and so may never hand off to it. With the status message in place the agents understand that new expertise is now available and are more likely to hand off properly.

The coordinator role

The of the roles (agents) available in the system is that of coordinator, they have very general project knowledge and a prompt instructing them to solicit opinions of other agents, keep the conversation flowing and to summaries agent contributions when a natural stopping point is reached. The other agents are aware of the coordinator role and will often hand off to it when they decide that there is nothing more to add.

The ‘System’ Role

The system role is built in to all chats and cannot be removed, it delivers the status messages and takes agentic actions like changing the project status, adding new tasks, risks issues etc. The Verto system is not directly addressable by human chat members, instead it response only to hidden hand off messages from the other agents asking it to take actions on their behalf.

You can only load up each agent with so many rules and complications before getting diminishing returns and the domain expert agents have enough of that already by implementing an agent representing the system itself who’s sole responsibility is taking agentic actions you can split that complexity up and get better results, increasing the chances the that right tool is called at the right time

When an agentic action is called it is not taken without confirmation from a human chat member, this keeps humans in charge, maintains an accountable audit trail and ensures no actions are taken in error automatically.

Putting it all together

Taken together this gives us a multi agent chat system that is capable of answering questions about data, pulling out interesting insights and acting as advisors and surrogate project members.

Key Points

Each agent plays a specific role, with it’s own goals, interests and knowledge
Agents respond to human questions and requests and collaborate with each other to give the best response possible
Agents take turns and hand off control to one another based on their expertise and the flow of the conversation
Agents send hidden messages to each other stating their reasoning for handing off control and their expectations
A team coordinator agent keeps the other agents on track and summarises their responses.
Events and History are logged in the conversation by a system role
Agentic actions are taken by a system role

Conclusions

In this longer than normal post I’ve gone over the architecture of a multi-agent, multi-turn chat system designed to provide data insights, take agentic actions and act as a virtual project team. There is a huge amount I’ve had to miss out for brevity and to keep this all somewhat readable. In my next post, which will be much shorter. I’ll talk a bit about the huge difference supplying the right context to each agent makes.

Building a Multi-Turn, Multi-Agent, Data Insight Chat System Part 4: Architecture, Pitfalls, Problems & Solutions

Introduction

Building prompts for each agent

Prompt anatomy

Expected Response

Getting The Agents talking

Approach 1: Polling Everyone

Approach 2: The '"Hand off” system

Stopping The Agents talking

The importance of logging activity within the chat thread

The coordinator role

The ‘System’ Role

Putting it all together

Conclusions

Comments

More from this blog

Vibe Coding Visual Studio Extensions for Fun and Profit

Power BI embedded for ISV: Architecture, Data Isolation, Security, Capacity and other challenges

The importance of validating LLM output: Getting to the car wash

Building a Multi-Turn, Multi-Agent, Data Insight Chat System Part 6: Wrapping Up. Results, Limitations & Refinement

Building a Multi-Turn, Multi-Agent, Data Insight Chat System Part 5: Model Matters, Context Matters

Command Palette

Introduction

Building prompts for each agent

Prompt anatomy

Expected Response

Getting The Agents talking

Approach 1: Polling Everyone

Approach 2: The '"Hand off” system

Stopping The Agents talking

The importance of logging activity within the chat thread

The coordinator role

The ‘System’ Role

Putting it all together

Conclusions

Comments

More from this blog