Large Language Models as Agents
The Rise and Potential of Large Language Model Based Agents : A Survey
In the exploration of artificial intelligence (AI), the question of why Large Language Models (LLMs) are ideally positioned to serve as the primary component of an AI agent's brain is not only a technical inquiry but also a philosophical and scientific contemplation.
This piece reviews the properties that define agents within AI and investigates the alignment of LLMs with these properties, seeking to illustrate their suitability as the central cognitive entity in AI agents.
Autonomy: The Essence of Independent Action
Autonomy signifies the capacity for self-governance and independent operation without human or external intervention.
This quality demands that an agent not only executes tasks based on explicit instructions but also possesses the capability for self-initiated action. LLMs, with their ability to generate human-like text, engage in dialogue, and perform tasks autonomously, embody this principle of autonomy.
They can adjust their outputs dynamically in response to environmental inputs, showcasing adaptive autonomy. The creative potential of LLMs, as seen in generating novel ideas and solutions, further emphasises their self-directed exploratory and decision-making capabilities.
Reactivity: The Ability to Respond to the World
Reactivity in AI agents pertains to their capacity to react to changes and stimuli in their environment.
This property allows agents to perceive shifts in their surroundings and respond 'appropriately' in a timely manner.
Presently LLMs are confined to the virtual world, with an inability to perceive the outside world. This is likely to change over the next few years, as models are integrated with physical devices that can provide external perception (computer vision).
While LLM-based agents may require an intermediary step to translate thoughts or planned tool usage into actionable measures, this mirrors human behavioural patterns where deliberation precedes action, aligning closely with the natural process of thoughtful response.
Pro-activeness: Beyond Mere Reaction
Pro-activeness in AI agents denotes the propensity to engage in goal-oriented actions proactively rather than reacting to environmental stimuli.
This aspect highlights the agents' ability to reason, plan, and take initiative in their actions to achieve specific objectives or adapt to changes.
LLMs can demonstrate capacity for generalised reasoning and planning, showing that, through prompting, they can engage in complex reasoning and planning processes.
Their ability to reformulate goals, decompose tasks, and adjust plans in response to environmental changes signifies a profound level of pro-active engagement with their tasks and objectives.
Social Ability: The Art of Interaction
The social ability of an agent refers to its capacity to interact with other agents, including humans, through a communicative language.
LLMs excel in this domain through their natural language interaction capabilities, enabling them to understand and generate human-like responses. This foundation of social ability allows LLM-based agents to enhance task performance through collaborative and competitive social behaviors.
By simulating different roles and facilitating a social division of labor, LLM-based agents can create a dynamic and interactive social environment among multiple agents, leading to emergent social phenomena.
The Birth of an Agent: A Conceptual Framework
The conceptual framework for constructing LLM-based agents comprises three key components: brain, perception, and action.
This framework not only encapsulates the core functionalities of memorising, thinking, and decision-making attributed to the brain but also extends the agent’s perceptual capabilities beyond text to include multimodal information processing.
Furthermore, the action module empowers the agent with the ability to adapt to and influence its environment effectively.
Last updated