Skip to Content

Cloudflare Agent Memory Beta 2026: How to Build AI Agents That Remember Across Sessions [Code Tutorial]

Step-by-step guide to building persistent memory AI agents using Cloudflare's Agent Memory service — code examples, five operations, and real-world implementation
May 5, 2026, 18:01 Eastern Daylight Time by
Cloudflare Agent Memory Beta 2026: How to Build AI Agents That Remember Across Sessions [Code Tutorial]

Cloudflare Agent Memory is now in private beta. This managed service extracts information from agent conversations and makes it available when needed — without filling up your context window. You get persistent, retrievable memory across sessions with just five operations.

What You Will Learn

  • The five core operations: ingest, remember, recall, list, and forget
  • How to integrate Agent Memory with Cloudflare Workers
  • Four memory types: facts, events, instructions, and tasks
  • Building an agent that remembers user preferences across sessions

Why AI Agents Need Memory

Agents, as they exist today, are ephemeral. They run for a session, tied to a single process, and then they are gone. A coding agent forgets what you asked it to build. A customer service bot forgets the user's preferences. Every conversation starts from zero.

This problem — called context rot — happens because traditional agents store everything in the context window. As conversations grow longer, you hit token limits, and the agent starts losing track of important details. The solution isn't more context; it's smarter memory.

Cloudflare Agent Memory solves this by moving memory out of the prompt entirely. Instead of keeping everything in context, it extracts useful information from conversations and stores it separately, making it available when needed without filling up the model's working window.

Professional Recommendation

Agent Memory is currently in private beta. You can request access through Cloudflare's developer documentation. The service integrates with the Cloudflare Agents SDK and is accessible via Worker bindings or REST API.

The Five Core Operations

Agent Memory exposes a simple API with five operations. Each operation handles a specific memory management task:

ingest

Extract and store memories from conversation turns automatically

remember

Store a single memory explicitly (direct tool use by the model)

recall

Retrieve relevant memories based on a query

list

List all stored memories for a session or user

forget

Remove specific memories that are no longer relevant

Four Memory Types

Agent Memory organizes information into four distinct types, each suited to different kinds of knowledge:

Memory Type Description Example
Facts User preferences, personal details, learned information "User prefers dark mode"
Events Past interactions, completed tasks, significant moments "User completed onboarding"
Instructions Custom rules, preferences, behavior guidelines "Always use TypeScript"
Tasks Pending actions, goals, todo items "Review pull request"

Code Tutorial: Building a Remembering Agent

Let's build a customer service agent that remembers user preferences across sessions. This example shows how to integrate Agent Memory with a Cloudflare Worker.

Step 1: Set Up the Worker with Agent Memory Binding

// wrangler.toml
name = "remembering-agent"
main = "src/index.ts"
compatibility_date = "2026-05-06"

[observability]
enabled = true

# Add Agent Memory binding
[[observability.logs drains]]
destination = "stdout"

Step 2: Configure the Agent Memory Binding

// In your Worker's tsconfig or type definitions
interface Env {
  AGENT_MEMORY: AgentMemory;
}

// Agent Memory binding is automatically available
// when you enable the feature in your Cloudflare account

Step 3: Implement the Remembering Agent

import { Agent } from "@cloudflare/agents";
import type { AIEvent } from "@cloudflare/agents";

export default {
  async fetch(request: Request, env: Env): Promise {
    const url = new URL(request.url);
    
    // Handle API requests
    if (url.pathname === "/chat") {
      return handleChat(request, env);
    }
    
    return new Response("Not Found", { status: 404 });
  }
};

async function handleChat(request: Request, env: Env): Promise {
  const { message, sessionId } = await request.json();
  const memory = env.AGENT_MEMORY;
  
  // 1. RECALL: Get relevant memories before processing
  const relevantMemories = await memory.recall({
    sessionId,
    query: message,
    limit: 5,
    memoryTypes: ["facts", "instructions"]
  });
  
  // 2. Build context from memories
  const contextFromMemory = relevantMemories.length > 0 
    ? `\nUser context from previous sessions:\n${relevantMemories.map(m => `- ${m.content}`).join('\n')}`
    : "";
  
  // 3. Process with the agent
  const agent = new Agent({
    model: "claude-3.7-sonnet",
    systemPrompt: `You are a helpful customer service agent.${contextFromMemory}`,
  });
  
  const response = await agent.run(message);
  
  // 4. INGEST: Automatically extract and store new memories
  await memory.ingest({
    sessionId,
    messages: [
      { role: "user", content: message },
      { role: "assistant", content: response }
    ]
  });
  
  // 5. EXPLICIT REMEMBER: Store specific important facts
  if (message.includes("I prefer")) {
    const preference = message.match(/I prefer (.+)/)?.[1];
    if (preference) {
      await memory.remember({
        sessionId,
        memoryType: "facts",
        content: `User prefers ${preference}`,
        importance: 0.8
      });
    }
  }
  
  return Response.json({ 
    response,
    memoriesRetrieved: relevantMemories.length 
  });
}

Step 4: Using the List and Forget Operations

// List all memories for a user (admin functionality)
async function listUserMemories(sessionId: string, env: Env) {
  const memories = await memory.list({
    sessionId,
    memoryTypes: ["facts", "events", "instructions", "tasks"],
    limit: 100
  });
  
  return memories;
}

// Forget specific outdated information
async function clearOutdatedMemories(sessionId: string, env: Env) {
  // Get all memories first
  const memories = await memory.list({ sessionId, limit: 50 });
  
  // Find and remove outdated ones
  for (const mem of memories) {
    if (mem.createdAt && Date.now() - mem.createdAt > 90 * 24 * 60 * 60 * 1000) {
      // Older than 90 days
      await memory.forget({
        sessionId,
        memoryId: mem.id
      });
    }
  }
}

// Task management with memory
async function completeTask(taskId: string, sessionId: string, env: Env) {
  // Mark task as completed
  await memory.remember({
    sessionId,
    memoryType: "events",
    content: `Task ${taskId} completed`,
    importance: 0.6
  });
  
  // Remove from active tasks
  await memory.forget({
    sessionId,
    memoryId: taskId
  });
}
Common Mistake to Avoid

Don't store everything in memory. Agent Memory works best when you let the system automatically extract important information (via ingest) and only explicitly store critical facts (via remember). Over-stuffing memory leads to retrieval noise and slower performance.

How Retrieval Works

When you call the recall operation, Agent Memory doesn't just do simple keyword matching. Behind the scenes, five parallel channels fetch what's relevant from different angles:

  • Semantic search — vector-based similarity matching
  • Keyword matching — traditional full-text search
  • Temporal weighting — recent memories ranked higher
  • Importance scoring — explicitly remembered facts ranked higher
  • Type filtering — memories from the right type (facts vs tasks)

A Reciprocal Rank Fusion algorithm combines the results from all five channels, so the best memories always surface first. This multi-channel approach ensures you get the most relevant context without manual tuning.

5 Parallel Channels
4 Memory Types
RRF Ranking Algo

Shared Memory for Teams

One powerful feature of Agent Memory is shared memory capability. This allows teams to share a profile so knowledge learned by one engineer's coding agent is available to all. Imagine a team where everyone's coding assistant knows the team's coding standards, preferred libraries, and project architecture — without each agent having to learn it independently.

To enable shared memory, you configure a team or organization-level session ID instead of individual user IDs. All agents in the team then query against the same memory store.

Example: Team Coding Standards Memory

// Store team coding standards (done once, shared across all agents)
await memory.remember({
  sessionId: "team-engineering",  // Team-level session
  memoryType: "instructions",
  content: "Use TypeScript for all new projects. Prefer functional components in React. Use 2-space indentation.",
  importance: 1.0  // Maximum importance
});

// All agents can now recall these standards
const standards = await memory.recall({
  sessionId: "team-engineering",
  query: "What are the team's React patterns?",
  limit: 3
});

Final Verdict

Cloudflare Agent Memory solves the biggest problem with AI agents today: they forget everything between sessions. With just five operations (ingest, remember, recall, list, forget), four memory types, and smart retrieval, you can build agents that actually remember what they learned. The private beta is open — request access and start building.

Last Updated: May 06, 2026 | Source: Cloudflare Blog & Developer Documentation (Official Website)

Frequently Asked Questions

Cloudflare Agent Memory is a managed service that gives AI agents persistent, retrievable memory across sessions. It extracts information from conversations and makes it available when needed — without filling up the context window. Currently in private beta, it provides five core operations: ingest, remember, recall, list, and forget.
The five operations are: 1) ingest — automatically extracts and stores memories from conversation turns, 2) remember — lets you explicitly store a single memory, 3) recall — retrieves relevant memories based on a query, 4) list — shows all stored memories, and 5) forget — removes specific memories that are no longer relevant.
Agent Memory organizes information into four types: Facts (user preferences, personal details), Events (past interactions, completed tasks), Instructions (custom rules, behavior guidelines), and Tasks (pending actions, goals). Each type serves different retrieval purposes.
Yes! Agent Memory can be accessed via a binding from any Cloudflare Worker, or via REST API for agents running outside of Workers. This follows the same pattern as other Cloudflare developer platform APIs.
Agent Memory uses five parallel channels to fetch relevant memories: semantic search (vector-based), keyword matching, temporal weighting (recent memories ranked higher), importance scoring, and type filtering. A Reciprocal Rank Fusion algorithm combines results so the best memories always surface first.
Yes! Agent Memory supports shared memory capability. Teams can configure a team or organization-level session ID instead of individual user IDs, so knowledge learned by one engineer's coding agent is available to all agents in the team.