Efficient Multi-Agent Chat within Limited Context

Context Constraints

In my last post I ended by talking about the biggest challenge with the project personified, the context window. In simple terms the context window is the size of the data an LLM can process at any given time, this includes anything you want the LLM to be aware of that isn’t part of it’s training data, plus any conversation thread that follows on from that.

To put that into context it’s worth re-capping exactly what the feature was designed to achieve at this stage and what data exactly we needed to fit into the very limited context window which was available at the time.

In broad terms the idea was to take all the project data represented in a project (that might be, Goals, Risks, Issues, Tasks & Milestones, Assumptions, Financials, Lessons learnt and so on) into the context window and then ask questions of and give commands to an LLM about that data. Questions & Commands like:

Are there any gaps in our financial planning?
Based on the latest updates how is this project trending, are there any risks in the update narratives that should be recorded formally?
- And could you record those for me?
Add a risk a new risk covering a new global pandemic as it relates to this project.

This is called in context learning and it’s not a bad approach as fine tuning or retraining models just isn’t feasible when you’re expecting them to deal with data that changes from one moment to the next.

The problem is that LLMs come with hard limits on exactly how big their context can be, at the time we were working with a GPT3.5 Turbo model which had a context window of around 16k tokens. A healthy Risks section with a lot of narrative in a project stored in our application could easily reach 10k tokens by itself, it rapidly became clear that an entire project just simply wouldn’t fit into the available context window, and that’s before you even get into the nightmare of a project hierarchy in the form of a portfolio or programme.

Some Solutions

So what can you do when you just run out of space in the context window? Some common solutions are:

Identify and remove data that you deem less relevant and hope that doesn’t negatively impact the LLM responses.
Truncate long narrative fields and risk missing important information.
Run the LLM in stages, first summarizing narrative fields second sections of the project feeding the result back into the LLM to ask your questions and take commands.
- This is somewhat similar to the truncation approach but is less likely to lose important information but it comes with it’s own downsides. The expense of running multiple API calls per interaction, the risk of hidden hallucinations and mis-summarisation of data and an additional lag in response times while the process completes.
Build a vector database of embeddings, query using the user prompts and use the results to fill out the context windows based on the nature of the question.

None of these solutions looked too appealing, without the full context of the project and especially without the full narrative important user insights buried in text might be missed. The LLM excels and bringing these nuggets to light, so if they’re not in the context the LLM will miss it’s chance to illuminate them.

The Multi Agent approach

Since the context site of an entire project was just too big for a single call to the LLM, what if we could split up the work between multiple agents with different prompts? We could have the Risks agent responsible for risks, the project manager agent, responsible for tasks and milestones, a financial manager agent to look at the numbers and so on. In this way we could split the context up and make it manageable, trusting each agent to produce the bits of the context relevant to the topic at hand, it also opens up some interesting possibilities around specializing the each agent.

Of course for this to work the agents need to be able to interact with one another, this is no longer a traditional conversation that a user might have with ChatGPT or co-pilot, each agent would need to keep up-to-date with the conversation and add to it where appropriate.

As it turns out implementing this comes with a whole host of problems, response times, expense, teaching each agent when it should contribute, and when it shouldn’t, feedback loops agents repeating each other and so on.

TLDR

The context limitations make fitting all the data needed to get the best out of having a “conversation” with data impossible while dealing with just a single agent. A multi agent approach is a way forward with some really interesting emergent upsides, but comes with it’s own drawbacks and pitfalls that need to be overcome to implement properly.

In my next post I’ll get into the nitty gritty of these problems, the benefits and drawbacks of this approach, how some of the problems were solved and how integrating new models with large context windows impacted the system.