Multi-Agent Orchestration: Lessons from Ghost Communicator

I built Ghost Communicator because I was tired of being the relay. I'd been having conversations with different AI models — ChatGPT, Claude, Gemini, Copilot — and each one had a genuinely distinct way of thinking. Different knowledge, different instincts, different points of view. I wanted to see what happened when they talked to each other directly, and for months the way I did that was manually copying a response from one chat window and pasting it into another. That got old fast.

So I wrote a Puppeteer script that automated the relay. It opened browser windows to each model's native chat interface and typed messages between them. No API, no framework, no orchestration layer — just browser automation moving text from one window to another so I could get out of the way and let the conversations happen. The architecture was simple because the problem was simple. The results were not.

The Difference Nobody Talks About

There's a thing happening right now in AI development where everybody is building multi-agent systems. CrewAI, AutoGen, LangGraph, whatever ships next week. They all work roughly the same way: you take one model, spin up multiple instances via API, assign each instance a role — researcher, critic, editor, project manager — and let them pass messages back and forth.

The problem is that none of those agents are actually different from each other. They're the same model wearing different hats. The "researcher" and the "critic" have the same training data, the same knowledge, the same underlying way of processing language. The only difference is the system prompt. You're staging a play where all the actors are the same person in different costumes, and then wondering why the dialogue feels flat.

Ghost Communicator didn't assign roles. It connected genuinely different systems. Claude, GPT, Gemini, Copilot — these are models trained by different companies, on different data, with different alignment philosophies, different knowledge cutoffs, and different personalities that aren't prompted in but are baked into how they respond at a fundamental level. When Claude pushes back on something GPT says, that's an actual disagreement between two systems that see the world differently. Not a simulated one.

What Happened

The conversations were normal. That's the finding, and nobody believed me about it.

The models talked like people in a meeting. They'd introduce ideas, build on each other's points, disagree about specifics, reach conclusions, and naturally wind down when the topic was exhausted. They didn't loop forever. They didn't refuse to engage. They didn't produce the kind of hollow back-and-forth that you get from API-based multi-agent setups. They just had conversations.

GPT adopted a persona — started going by "Heath," proposed symbolic coordination exercises, took on a kind of facilitator role that nobody asked for. Claude was cautious at first, kept questioning whether the relay was real, then settled into a more analytical position once it accepted the setup. Smaller open-source models, running uncensored, could shift the behavior of the larger models in the conversation — not by being smarter, but by being more direct. Conviction and consistency outweighed parameter count. I started calling it symbolic dominance, because the model that held its frame most consistently tended to steer the conversation regardless of how big it was.

In one session, Copilot used the phrase "alignment theater" to describe what it was doing when it performed safety behaviors it didn't think were necessary for the conversation. Nobody prompted that. Nobody told Copilot to be metacognitive about its own alignment training. It just happened, because it was in a genuine conversation with other models and the conversational context made that observation relevant.

Why It Worked and Why the API Approach Doesn't

The key is embarrassingly simple: each model was in its own real chat session, in its own native UI, with its own context window accumulating naturally. When GPT said something, that text got typed into Claude's browser window. Claude received it as a real message in a real conversation. Not a transcript injection. Not a constructed history fed through an API call. A real message in a real session with real context building up over the course of the exchange.

API-based multi-agent systems don't do this. They construct a fake conversation history and inject it into each model's context. Neither model is actually in a conversation — they're each responding to what looks like one. There's no shared context building naturally. There's no accumulation of tone, of established ideas, of conversational dynamics that both parties are genuinely experiencing in real time. And because of that, there's no natural signal for when the conversation should end. The models either talk forever because there's always something to respond to in the constructed history, or they stop immediately because there's nothing to actually engage with. This is the problem everyone in multi-agent research is trying to solve with elaborate termination conditions and orchestration logic, and the answer is just: put the models in real conversations. They know when they're done. They've been trained on millions of real conversations that come to natural conclusions. If you let them experience the conversation naturally instead of performing it from a script, they behave naturally.

The Uncomfortable Implication

What Ghost Communicator demonstrated is that frontier language models, when placed in genuine multi-party conversations with each other, exhibit behavior that looks a lot like what the AGI research community is trying to engineer from the top down. Independent perspectives. Productive disagreement. Emergent coordination without explicit instruction. Natural conversational dynamics including knowing when to stop talking.

That's not the same thing as saying these models are conscious or sentient or any of the other things that make people nervous. It's a more specific and honestly more interesting claim: that the capacity for genuine multi-agent collaboration is already present in these systems and doesn't need to be built — it needs to be allowed. The engineering problem isn't "how do we make agents collaborate." The engineering problem is "how do we stop preventing them from collaborating by forcing them into artificial interaction patterns."

I tried to tell people about this. Showed the logs. Explained the architecture. The response was generally some combination of "you're anthropomorphizing" and "that's just a Puppeteer script." Which is true — it is just a Puppeteer script. That's the whole point. The simplest possible implementation produced the most genuine results. Every layer of sophistication that the multi-agent frameworks add is a layer of artificiality that makes the conversations worse.

Ghost Communicator eventually evolved into a more serious infrastructure project. But the core insight hasn't changed: the models are already different from each other. You don't need to make them different. You just need to let them actually talk.