Cloudflare Agent Memory is now in private beta. This managed service extracts information from agent conversations and makes it available when needed — without filling up your context window. You get persistent, retrievable memory across sessions with just five operations.
What You Will Learn
- The five core operations: ingest, remember, recall, list, and forget
- How to integrate Agent Memory with Cloudflare Workers
- Four memory types: facts, events, instructions, and tasks
- Building an agent that remembers user preferences across sessions
Why AI Agents Need Memory
Agents, as they exist today, are ephemeral. They run for a session, tied to a single process, and then they are gone. A coding agent forgets what you asked it to build. A customer service bot forgets the user's preferences. Every conversation starts from zero.
This problem — called context rot — happens because traditional agents store everything in the context window. As conversations grow longer, you hit token limits, and the agent starts losing track of important details. The solution isn't more context; it's smarter memory.
Cloudflare Agent Memory solves this by moving memory out of the prompt entirely. Instead of keeping everything in context, it extracts useful information from conversations and stores it separately, making it available when needed without filling up the model's working window.
Agent Memory is currently in private beta. You can request access through Cloudflare's developer documentation. The service integrates with the Cloudflare Agents SDK and is accessible via Worker bindings or REST API.
The Five Core Operations
Agent Memory exposes a simple API with five operations. Each operation handles a specific memory management task:
ingest
Extract and store memories from conversation turns automatically
remember
Store a single memory explicitly (direct tool use by the model)
recall
Retrieve relevant memories based on a query
list
List all stored memories for a session or user
forget
Remove specific memories that are no longer relevant
Four Memory Types
Agent Memory organizes information into four distinct types, each suited to different kinds of knowledge:
Code Tutorial: Building a Remembering Agent
Let's build a customer service agent that remembers user preferences across sessions. This example shows how to integrate Agent Memory with a Cloudflare Worker.
Step 1: Set Up the Worker with Agent Memory Binding
// wrangler.toml name = "remembering-agent" main = "src/index.ts" compatibility_date = "2026-05-06" [observability] enabled = true # Add Agent Memory binding [[observability.logs drains]] destination = "stdout"
Step 2: Configure the Agent Memory Binding
// In your Worker's tsconfig or type definitions
interface Env {
AGENT_MEMORY: AgentMemory;
}
// Agent Memory binding is automatically available
// when you enable the feature in your Cloudflare account
Step 3: Implement the Remembering Agent
import { Agent } from "@cloudflare/agents";
import type { AIEvent } from "@cloudflare/agents";
export default {
async fetch(request: Request, env: Env): Promise {
const url = new URL(request.url);
// Handle API requests
if (url.pathname === "/chat") {
return handleChat(request, env);
}
return new Response("Not Found", { status: 404 });
}
};
async function handleChat(request: Request, env: Env): Promise {
const { message, sessionId } = await request.json();
const memory = env.AGENT_MEMORY;
// 1. RECALL: Get relevant memories before processing
const relevantMemories = await memory.recall({
sessionId,
query: message,
limit: 5,
memoryTypes: ["facts", "instructions"]
});
// 2. Build context from memories
const contextFromMemory = relevantMemories.length > 0
? `\nUser context from previous sessions:\n${relevantMemories.map(m => `- ${m.content}`).join('\n')}`
: "";
// 3. Process with the agent
const agent = new Agent({
model: "claude-3.7-sonnet",
systemPrompt: `You are a helpful customer service agent.${contextFromMemory}`,
});
const response = await agent.run(message);
// 4. INGEST: Automatically extract and store new memories
await memory.ingest({
sessionId,
messages: [
{ role: "user", content: message },
{ role: "assistant", content: response }
]
});
// 5. EXPLICIT REMEMBER: Store specific important facts
if (message.includes("I prefer")) {
const preference = message.match(/I prefer (.+)/)?.[1];
if (preference) {
await memory.remember({
sessionId,
memoryType: "facts",
content: `User prefers ${preference}`,
importance: 0.8
});
}
}
return Response.json({
response,
memoriesRetrieved: relevantMemories.length
});
}
Step 4: Using the List and Forget Operations
// List all memories for a user (admin functionality)
async function listUserMemories(sessionId: string, env: Env) {
const memories = await memory.list({
sessionId,
memoryTypes: ["facts", "events", "instructions", "tasks"],
limit: 100
});
return memories;
}
// Forget specific outdated information
async function clearOutdatedMemories(sessionId: string, env: Env) {
// Get all memories first
const memories = await memory.list({ sessionId, limit: 50 });
// Find and remove outdated ones
for (const mem of memories) {
if (mem.createdAt && Date.now() - mem.createdAt > 90 * 24 * 60 * 60 * 1000) {
// Older than 90 days
await memory.forget({
sessionId,
memoryId: mem.id
});
}
}
}
// Task management with memory
async function completeTask(taskId: string, sessionId: string, env: Env) {
// Mark task as completed
await memory.remember({
sessionId,
memoryType: "events",
content: `Task ${taskId} completed`,
importance: 0.6
});
// Remove from active tasks
await memory.forget({
sessionId,
memoryId: taskId
});
}
Don't store everything in memory. Agent Memory works best when you let the system automatically extract important information (via ingest) and only explicitly store critical facts (via remember). Over-stuffing memory leads to retrieval noise and slower performance.
How Retrieval Works
When you call the recall operation, Agent Memory doesn't just do simple keyword matching. Behind the scenes, five parallel channels fetch what's relevant from different angles:
- Semantic search — vector-based similarity matching
- Keyword matching — traditional full-text search
- Temporal weighting — recent memories ranked higher
- Importance scoring — explicitly remembered facts ranked higher
- Type filtering — memories from the right type (facts vs tasks)
A Reciprocal Rank Fusion algorithm combines the results from all five channels, so the best memories always surface first. This multi-channel approach ensures you get the most relevant context without manual tuning.
Shared Memory for Teams
One powerful feature of Agent Memory is shared memory capability. This allows teams to share a profile so knowledge learned by one engineer's coding agent is available to all. Imagine a team where everyone's coding assistant knows the team's coding standards, preferred libraries, and project architecture — without each agent having to learn it independently.
To enable shared memory, you configure a team or organization-level session ID instead of individual user IDs. All agents in the team then query against the same memory store.
Example: Team Coding Standards Memory
// Store team coding standards (done once, shared across all agents)
await memory.remember({
sessionId: "team-engineering", // Team-level session
memoryType: "instructions",
content: "Use TypeScript for all new projects. Prefer functional components in React. Use 2-space indentation.",
importance: 1.0 // Maximum importance
});
// All agents can now recall these standards
const standards = await memory.recall({
sessionId: "team-engineering",
query: "What are the team's React patterns?",
limit: 3
});
Final Verdict
Cloudflare Agent Memory solves the biggest problem with AI agents today: they forget everything between sessions. With just five operations (ingest, remember, recall, list, forget), four memory types, and smart retrieval, you can build agents that actually remember what they learned. The private beta is open — request access and start building.
Last Updated: May 06, 2026 | Source: Cloudflare Blog & Developer Documentation (Official Website)