Recursive Language Models with ai-rlm

Recursive language models (RLMs) are a simple idea: let a model call itself (or another model step) to break harder tasks into smaller tasks, then combine the results.

Instead of one long prompt and one final answer, you get a structured loop:

Plan the next subproblem.
Solve it with a model/tool call.
Evaluate the result.
Recurse until the stopping condition is met.

This pattern helps with tasks that benefit from decomposition, verification, or iterative refinement.

Why recursion helps

Many model failures happen when a task is too broad for one pass. Recursion gives the system more chances to reason in smaller contexts. In practice, this can improve reliability for multi-step work like code generation, synthesis across sources, and constrained decision making.

ai-rlm in practice

I built ai-rlm to explore this pattern in a concrete, programmable way.

The project focuses on:

A recursive control loop with clear stop conditions to keep decomposition useful and bounded.
Inspectable intermediate state and agent-oriented APIs so recursive runs are observable and practical in real apps.

Using the AI SDK for an Agent

ai-rlm uses the AI SDK as the foundation for model calls and Agent behavior.

The SDK makes it easier to wire up:

Cross-provider model calls and structured tool execution.
Streaming, typed outputs, and a clean interface for building and operating agent loops.

At a high level, the Agent in ai-rlm does:

Receive a root task.
Decide whether to solve directly or split into children.
Execute each child with model/tool calls.
Merge child outputs into a parent result.
Return when confidence or depth limits are reached.

The key point is not recursion for its own sake. It is recursion with constraints, observability, and explicit stop rules.

Here is a minimal RLMAgent example:

import { RLMAgent } from 'ai-rlm'
import { openai } from '@ai-sdk/openai'

const agent = new RLMAgent({
  model: openai('gpt-4.1'),
  subModel: openai('gpt-4.1-mini'),
  maxIterations: 12,
  maxLLMCalls: 24,
})

const result = await agent.generate({
  context: `
    Ticket #1842: checkout failures increased 17% after deploy.
    Errors spike when promo codes and guest checkout are used together.
  `,
  query: 'What is the most likely root cause and first fix to ship?',
})

console.log(result.text)
console.log(result.iterations, result.llmCallCount)

And here is an example with streaming and execution trace output:

import { RLMAgent } from 'ai-rlm'
import { openai } from '@ai-sdk/openai'

const agent = new RLMAgent({
  model: openai('gpt-4.1'),
  subModel: openai('gpt-4.1-mini'),
  maxIterations: 20,
  maxLLMCalls: 40,
  verbose: false,
})

const stream = await agent.stream({
  context: largeCodebaseSummary,
  query: 'Find the highest-risk auth bug and propose a patch plan.',
})

for await (const delta of stream.textStream) {
  process.stdout.write(delta)
}

// Full recursive trajectory (reasoning/code/output per step)
console.log(stream.steps)

What I am exploring next

More end-to-end examples and evaluations of RLM behavior across realistic agent tasks.
Practical CLI tools built on ai-rlm for debugging, analysis, and workflow automation.

If you are working on production agents, this approach is worth testing where tasks naturally branch and recombine.