# Can Large Language Models Reason and Plan?

The <mark style="color:blue;">**March 2024**</mark> paper examines whether Large Language Models (LLMs) can perform planning and reasoning tasks, traditionally associated with higher cognitive functions.&#x20;

Despite LLMs' impressive linguistic capabilities, the author argues they are essentially sophisticated n-gram models that *<mark style="color:yellow;">**perform approximate retrieval rather than principled reasoning**</mark>*.&#x20;

This paper reinforces our view that generative AI is an augmentation to human work and endeavour, not a replacement.

{% embed url="<https://arxiv.org/abs/2403.04121>" %}
Can Large Language Models Reason and Plan
{% endembed %}

The study tested LLMs like GPT3, GPT3.5, and GPT4 using planning instances from the International Planning Competition and found that while there were improvements in the accuracy of generated plans across versions, the results were <mark style="color:blue;">**not definitive evidence of genuine planning capability**</mark>.

The paper distinguishes between LLMs generating correct responses through memorisation or pattern recognition and performing actual reasoning.  To further test LLMs' planning abilities, the study *<mark style="color:yellow;">**employed obfuscation techniques**</mark>*, which significantly reduced GPT4's performance, suggesting reliance on retrieval rather than planning.

Two methods were explored to potentially enhance LLMs' planning performance: *<mark style="color:yellow;">**fine-tuning with planning data and prompting with hints or external verifiers**</mark>*.  However, fine-tuning didn't show significant improvement, suggesting that it might lead to better approximate retrieval rather than genuine planning.&#x20;

The paper advocates for a framework where *<mark style="color:yellow;">**LLMs' generative capabilities are combined with external verifiers**</mark>* to ensure the correctness and soundness of the planning and reasoning outputs, a setup referred to as <mark style="color:green;">**LLM-Modulo frameworks**</mark><mark style="color:green;">.</mark>

The study concludes that while LLMs exhibit some level of problem-solving capability, their performance in planning and reasoning tasks is largely based on approximate retrieval and memorisation, not genuine reasoning or planning as traditionally understood in AI.

The paper critiques the common practice of iterative prompting by humans in the loop, which may lead to a Clever Hans effect, where the human's input, rather than the LLM's reasoning, guides the outcome. &#x20;

This approach is contrasted with self-improvement methods where LLMs critique and refine their own outputs. However, the author finds such self-verification to potentially worsen performance due to LLMs generating both false positives and negatives.

The author suggests a LLM-Modulo framework *<mark style="color:yellow;">**where LLMs generate potential solutions vetted by external verifiers or expert humans**</mark>*, ensuring a sound outcome. The paper also reflects on the broader implications of LLMs in AI, suggesting they can serve as knowledge sources for domain-specific information, a role previously filled by human knowledge engineers.

In summary, while LLMs demonstrate some level of problem-solving ability, their effectiveness in planning and reasoning is largely attributed to their retrieval capabilities rather than genuine reasoning or planning.&#x20;

The LLM-Modulo framework is proposed as a principled way to leverage LLMs' idea generation for reasoning tasks, supported by external verification to ensure accuracy and soundness.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://training.continuumlabs.ai/continuum-applications/discussion-and-use-cases/can-large-language-models-reason-and-plan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
