Mixture-of-Agents (MoA)

The Mixture-of-Agents (MoA) methodology leverages the strengths of multiple large language models (LLMs) to enhance performance in natural language understanding and generation tasks.

MoA constructs a layered architecture where each layer contains several LLM agents.

Each agent uses outputs from the previous layer to generate responses, significantly improving over state-of-the-art models like GPT-4 Omni. MoA achieves a score of 65.1% on AlpacaEval 2.0, outperforming GPT-4 Omni's 57.5%, using only open-source models.

LLMs have made significant advancements but face constraints like model size and training data, which are costly to scale.

Different LLMs excel in various tasks, raising the question of how to harness their collective expertise. MoA addresses this by leveraging the collaborative strengths of multiple LLMs, where each model improves its responses based on outputs from others, even if the initial outputs are of lower quality.


Collaborativeness of LLMs

LLMs generate better responses when referencing outputs from other models. MoA capitalizes on this by using multiple models in a layered architecture:

  1. Layer 1: Agents generate responses independently.

  2. Layer 2: Agents refine these responses using outputs from Layer 1.

  3. Layer 3: Further refinement continues through additional layers.

Structure of MoA

MoA comprises l layers, each with n LLM agents. The final layer synthesizes these responses into a single high-quality output using an Aggregate-and-Synthesize prompt.

Analogy to Mixture-of-Experts (MoE)

MoA extends the MoE concept by operating at the model level, leveraging full-fledged LLMs rather than sub-networks within a single model. This approach eliminates the need for fine-tuning and allows flexibility and scalability with off-the-shelf models.



MoA was evaluated on AlpacaEval 2.0, MT-Bench, and FLASK benchmarks, demonstrating significant improvements:

  • AlpacaEval 2.0: Achieved a score of 65.1%, outperforming GPT-4 Omni's 57.5%.

  • MT-Bench: Secured top positions, even with marginal improvements over already high-performing models.

  • FLASK: Showed substantial improvements in robustness, correctness, efficiency, factuality, commonsense, and insightfulness.

Budget and Token Analysis

MoA is cost-effective, outperforming models like GPT-4 Turbo by approximately 4% while being twice as cost-effective. It also efficiently utilizes computational resources to maximize performance.

Key Insights

  • Collaborativeness: LLMs tend to generate higher quality responses when leveraging outputs from other models.

  • Model Diversity: Using diverse models in each MoA layer improves performance.

  • Proposer and Aggregator Roles: Certain models excel in generating reference responses (proposers), while others synthesize these into high-quality outputs (aggregators).


The Mixture-of-Agents approach significantly enhances the capabilities of LLMs by leveraging the collective strengths of multiple models.

This methodology leads to superior performance, demonstrating the benefits of integrating diverse perspectives from various models. The systematic optimisation of MoA offers a promising direction for future research and development in natural language processing.

Last updated


Continuum - Accelerated Artificial Intelligence

Continuum WebsiteAxolotl Platform

Copyright Continuum Labs - 2023