LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum - Models and Applications
Continuum - Models and Applications
  • Continuum Labs - Applied AI
  • Overview
    • What we do
    • Our Features
    • Secure and Private GPU Infrastructure
    • Generative AI Implementation Risks
  • Model Range
    • ⏹️Model Range
    • Investment Management
    • Employment Law
    • Psychology and Mental Health
    • Home Insurance
    • Consumer Surveying
    • Government Grants
    • Aged Care
    • Pharmaceuticals Benefit Scheme
  • Discussion and Use Cases
    • Three ideas for autonomous agent applications
    • Financial Statement analysis with large language models
    • The Evolution of AI Agents and Their Potential for Augmenting Human Agency
    • Better Call Saul - SaulLM-7B - a legal large language model
    • MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
    • Anomaly detection in logging data
    • ChatDoctor: Artificial Intelligence powered doctors
    • Navigating the Jagged Technological Frontier: Effects of AI on Knowledge Workers
    • Effect of AI on the US labour market
    • Data Interpreter: An LLM Agent For Data Science
    • The impact of AI on the customer support industry
    • Can Large Language Models Reason and Plan?
    • KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
    • The flaws of 'product-market fit' in an emerging industry
    • Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence
    • The Disruption of the Administrative Class: How Generative AI is Reshaping Organisational Operations
    • How Knowledge Workers Think Generative AI Will (Not) Transform Their Industries
    • Embracing AI: A Strategic Imperative for Modern Leadership
    • Artificial Intelligence and Management: The Automation-Augmentation Paradox
    • Network effects in AI models
    • AI impact on the publishing industry
    • Power asymmetry
    • Information Asymmetry
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page

Was this helpful?

  1. Discussion and Use Cases

Can Large Language Models Reason and Plan?

The answer according to this research is 'no'

PreviousThe impact of AI on the customer support industryNextKnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

Last updated 1 year ago

Was this helpful?

The March 2024 paper examines whether Large Language Models (LLMs) can perform planning and reasoning tasks, traditionally associated with higher cognitive functions.

Despite LLMs' impressive linguistic capabilities, the author argues they are essentially sophisticated n-gram models that perform approximate retrieval rather than principled reasoning.

This paper reinforces our view that generative AI is an augmentation to human work and endeavour, not a replacement.

The study tested LLMs like GPT3, GPT3.5, and GPT4 using planning instances from the International Planning Competition and found that while there were improvements in the accuracy of generated plans across versions, the results were not definitive evidence of genuine planning capability.

The paper distinguishes between LLMs generating correct responses through memorisation or pattern recognition and performing actual reasoning. To further test LLMs' planning abilities, the study employed obfuscation techniques, which significantly reduced GPT4's performance, suggesting reliance on retrieval rather than planning.

Two methods were explored to potentially enhance LLMs' planning performance: fine-tuning with planning data and prompting with hints or external verifiers. However, fine-tuning didn't show significant improvement, suggesting that it might lead to better approximate retrieval rather than genuine planning.

The paper advocates for a framework where LLMs' generative capabilities are combined with external verifiers to ensure the correctness and soundness of the planning and reasoning outputs, a setup referred to as LLM-Modulo frameworks.

The study concludes that while LLMs exhibit some level of problem-solving capability, their performance in planning and reasoning tasks is largely based on approximate retrieval and memorisation, not genuine reasoning or planning as traditionally understood in AI.

The paper critiques the common practice of iterative prompting by humans in the loop, which may lead to a Clever Hans effect, where the human's input, rather than the LLM's reasoning, guides the outcome.

This approach is contrasted with self-improvement methods where LLMs critique and refine their own outputs. However, the author finds such self-verification to potentially worsen performance due to LLMs generating both false positives and negatives.

The author suggests a LLM-Modulo framework where LLMs generate potential solutions vetted by external verifiers or expert humans, ensuring a sound outcome. The paper also reflects on the broader implications of LLMs in AI, suggesting they can serve as knowledge sources for domain-specific information, a role previously filled by human knowledge engineers.

In summary, while LLMs demonstrate some level of problem-solving ability, their effectiveness in planning and reasoning is largely attributed to their retrieval capabilities rather than genuine reasoning or planning.

The LLM-Modulo framework is proposed as a principled way to leverage LLMs' idea generation for reasoning tasks, supported by external verification to ensure accuracy and soundness.

Can Large Language Models Reason and Plan?arXiv.org
Can Large Language Models Reason and Plan
Page cover image
Logo