Finalizing Multi-Agent Chat System Insights

Wrapping Things Up

Over the previous 5 parts I’ve gone over the journey from concept to a concrete multi-agent chat system. The system as it stands today is something I’m proud of, it enhances the product and delivers real value to its users.

In this post I’ll talk about where I see this feature going next, including the limitations on the systems and some ideas to overcome them. I’ll also use this post as a bit of a catch all, there are a lot of topics I just didn’t get around to talking to or didn’t fit easily into any of my other posts. I’ll briefly touch on those too.

Where the system is going next

The systems still has some limitations, the biggest one has been a constant thread running through all these posts, the size of the context window. The solutions already discussed and the march of AI technologies and the models available are sufficient to handle even fairly large projects, but large programmes and portfolios are a different story. Large Programmes and Portfolios can be between 10 and 100 times as large and overwhelm the context window.

Research Depth Control

With the present state of the art the only way to deal with very large Programmes and Portfolios is to shrink the context window. That means either summarising data or cutting it out wholesale. The next feature to be added to the system will give users control over exactly which data is included in the context (and thus known to the agents) allowing users to focus on one research area or another at any given time.

In practice this means letting users pick the fields and projects included in the dataset added to the context window and a cutoff date to exclude for those entities for which dates are important.

This will come with several presets that can be tweaked by users to fit their needs from conversation to conversation. Research depth controls will kick in dynamically when the context window exceeds a preset size limit. This should allow for larger data sets to be interrogated while focusing on specific areas.

More Agentic actions

At present the system role has several tools it is able to call autonomously (although there is an approval layer allowing for human intervention). These tools allow for adding Risks, Issues, Tasks etc, as well as changed project statuses. There are no tools for updating arbitrary data, I attempted this when the system was powered by earlier models and updating arbitrary data too much complexity for the model to handle. With the latest models I think it’s worth trying this again.

Memories

Copilot and some other LLM systems have a feature called “Memories” which look for consistent user trends (topics users bring up frequently, their preferred communication style, their job roles etc). It would improve model responses if it knew a bit more about individual users and their roles, interests and goals.

Learning from user feedback

There is a basic feedback system built into the system which allows for users to approve or disapprove of individual messages, our beta test group also provides detailed feedback. Listening to this feedback and reacting to it will be key to keeping the system relevant to users going forward.

Assorted Other topics

LLM Hosting Options

Finding the right option to host the models and provide inference was a bit of journey in itself. Ultimately the model the system relies on today is hosted via Azure Foundry, this helps with assurance for customers who might be nervous about trusting smaller suppliers. Several other suppliers were tried but either didn’t have the range of models available or had reliability or speed issues. Azure provided the best combination of assurance, reliability, speed and cost, a common API also has advantages in switching between models quickly as they become available..

An interesting avenue was self hosting a model on a rather expensive very powerful server provided by AWS. This ran vLLM and worked fairly well, it was limited to hosting open source models available on hugging face but suffered throughput issues, it likely couldn’t keep up with the number of concurrent requests expected in a production system. It was an interesting approach which provided valuable experience and would have provided ultimate assurance to clients that their data never left networks under our control but unfortunately it just wasn’t cost effective.

Engaging Naming

When talking about the agents in previous posts I’ve refereed them to , “Project Manager Agent”, “Risk Manager Agent” and so on. In the production system the agents are given more engaging, alliterative names. Like “Peter Project”, “Ruby Risk” & “Carol Coordinator”, it’s surprising the difference this makes. Agents are quickly anthropomorphized which makes agents easier to interact with, and easier to accept as members of the team.

The lighthearted alliterative approach to naming has gone down well with the users.

Rate Limits and Costs

Inference can be expensive, and without careful monitoring and controls costs and spiral quickly. It’s important to track token usage and costs on as many levels as possible. At the highest level tokens and costs can be tracked in a daily basis in Azure Foundry and cost alerts can be put in place to spot and react to anomalies.

It’s possible to track on a per completion basis, tokens in, cached tokens in & tokens out. In the chat system we built these values are recorded and the costs are calculated on a per message basis. This lets us keep an eye on how much each customer is using the system and how much they cost. At present there is no rate limit built in except for the limits imposed by Microsoft (which is 1 million tokens per minutes) but tracking at this level leaves the option open to impose rate limits on those customers who are overburdening the system.

Conclusion

That’s about all for now. I might do make another post if the system evolves in interesting ways. If you’ve stuck with me and read all 6 parts of this then thanks! I hope you’ve got something useful out of it and maybe some ideas of building a system of your own.

I’ll soon be slowing the pace of development of this system and handing it off to others while I focus on bringing the reporting solutions in the product up to date. I’ll be integrating PowerBI as an ISV and giving users the option to edit and author their own reports. Watch out for a post on that soon.

Building a Multi-Turn, Multi-Agent, Data Insight Chat System Part 6: Wrapping Up. Results, Limitations & Refinement

Wrapping Things Up

Where the system is going next

Research Depth Control

More Agentic actions

Memories

Learning from user feedback

Assorted Other topics

LLM Hosting Options

Engaging Naming

Conclusion

Comments

More from this blog

Vibe Coding Visual Studio Extensions for Fun and Profit

Power BI embedded for ISV: Architecture, Data Isolation, Security, Capacity and other challenges

The importance of validating LLM output: Getting to the car wash

Building a Multi-Turn, Multi-Agent, Data Insight Chat System Part 5: Model Matters, Context Matters

Command Palette

Wrapping Things Up

Where the system is going next

Research Depth Control

More Agentic actions

Memories

Learning from user feedback

Assorted Other topics

LLM Hosting Options

Engaging Naming

Conclusion

Comments

More from this blog